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Preface 


“We see that the theory of probability is at bottom only common sense reduced to 
calculation; it makes us appreciate with exactitude what reasonable minds feel by a 
sort of instinct, often without being able to account for it... It is remarkable that this 
science, which originated in the consideration of games of chance, should have 
become the most important object of human knowledge.... The most important 
questions of life are, for the most part, really only problems of probability.” So said 
the famous French mathematician and astronomer (the “Newton of France”) Pierre- 
Simon, Marquis de Laplace. Although many people believe that the famous marquis, 
who was also one of the great contributors to the development of probability, might 
have exaggerated somewhat, it is nevertheless true that probability theory has 
become a tool of fundamental importance to nearly all scientists, engineers, medical 
practitioners, jurists, and industrialists. In fact, the enlightened individual had learned 
to ask not “Is it so?” but rather “What is the probability that it is so?” 


General Approach and Mathematical Level 


This book is intended as an elementary introduction to the theory of probability for 
students in mathematics, statistics, engineering, and the sciences (including 
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computer science, biology, the social sciences, and management science) who 
possess the prerequisite knowledge of elementary calculus. It attempts to present 
not only the mathematics of probability theory, but also, through numerous examples, 
the many diverse possible applications of this subject. 


Content and Course Planning 


Chapter1 presents the basic principles of combinatorial analysis, which are most 
useful in computing probabilities. 


Chapter 2 handles the axioms of probability theory and shows how they can be 
applied to compute various probabilities of interest. 


Chapter3 deals with the extremely important subjects of conditional probability 
and independence of events. By a series of examples, we illustrate how conditional 
probabilities come into play not only when some partial information is available, but 
also as a tool to enable us to compute probabilities more easily, even when no partial 
information is present. This extremely important technique of obtaining probabilities 
by “conditioning” reappears in Chapter 7 _, where we use it to obtain expectations. 


The concept of random variables is introduced in Chapters4 ,5 ,and6 
Discrete random variables are dealt with in Chapter 4 —_, continuous random 
variables in Chapter 5 __, and jointly distributed random variables in Chapter 6 

The important concepts of the expected value and the variance of a random variable 
are introduced in Chapters 4 and5_ , and these quantities are then determined 
for many of the common types of random variables. 


Additional properties of the expected value are considered in Chapter 7. Many 
examples illustrating the usefulness of the result that the expected value of a sum of 
random variables is equal to the sum of their expected values are presented. 
Sections on conditional expectation, including its use in prediction, and on moment- 
generating functions are contained in this chapter. In addition, the final section 
introduces the multivariate normal distribution and presents a simple proof 
concerning the joint distribution of the sample mean and sample variance of a 
sample from a normal distribution. 


Chapter 8 presents the major theoretical results of probability theory. In particular, 
we prove the strong law of large numbers and the central limit theorem. Our proof of 
the strong law is a relatively simple one that assumes that the random variables have 
a finite fourth moment, and our proof of the central limit theorem assumes Levy’s 
continuity theorem. This chapter also presents such probability inequalities as 
Markov’s inequality, Chebyshev’s inequality, and Chernoff bounds. The final section 
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of Chapter 8 gives a bound on the error involved when a probability concerning a 
sum of independent Bernoulli random variables is approximated by the 
corresponding probability of a Poisson random variable having the same expected 
value. 


Chapter9 — presents some additional topics, such as Markov chains, the Poisson 
process, and an introduction to information and coding theory, and Chapter 10 
considers simulation. 


As in the previous edition, three sets of exercises are given at the end of each 
chapter. They are designated as Problems, Theoretical Exercises, and Self-Test 
Problems and Exercises. This last set of exercises, for which complete solutions 
appear in Solutions to Self-Test Problems and Exercises, is designed to help 
students test their comprehension and study for exams. 


Changes for the Tenth Edition 


The tenth edition continues the evolution and fine tuning of the text. Aside from a 
multitude of small changes made to increase the clarity of the text, the new edition 
includes many new and updated problems, exercises, and text material chosen both 
for inherent interest and for their use in building student intuition about probability. 
Illustrative of these goals are Examples 4n of Chapter 3 __, which deals with 
computing NCAA basketball tournament win probabilities, and Example 5b of 
Chapter 4 __, which introduces the friendship paradox. There is also new material 
on the Pareto distribution (introduced in Section 5.6.5 __), on Poisson limit results 
(in Section 8.5 _), and on the Lorenz curve (in Section 8.7 _ ). 
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Chapter 1 Combinatorial Analysis 


Contents 


1.1 Introduction 

1.2 The Basic Principle of Counting 
1.3 Permutations 

1.4 Combinations 

1.5 Multinomial Coefficients 


1.6 The Number of Integer Solutions of Equations 


1.1 Introduction 


Here is a typical problem of interest involving probability: A communication system is 
to consist of n seemingly identical antennas that are to be lined up in a linear order. 
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The resulting system will then be able to receive all incoming signals and will be 
called functional as long as no two consecutive antennas are defective. If it turns out 
that exactly m of the n antennas are defective, what is the probability that the 
resulting system will be functional? For instance, in the special case where n = 4 and 
m = 2, there are 6 possible system configurations, namely, 


PRoOoRoO;O 
FPFPOoOoOOoORrRRH 
CORR OR 
OrPRrRORoO 


where 1 means that the antenna is working and 0 that it is defective. Because the 
resulting system will be functional in the first 3 arrangements and not functional in the 
remaining 3, it seems reasonable to take : = ; as the desired probability. In the case 


of general n and m, we could compute the probability that the system is functional in 
a similar fashion. That is, we could count the number of configurations that result in 
the system's being functional and then divide by the total number of all possible 
configurations. 


From the preceding discussion, we see that it would be useful to have an effective 
method for counting the number of ways that things can occur. In fact, many 
problems in probability theory can be solved simply by counting the number of 
different ways that a certain event can occur. The mathematical theory of counting is 
formally known as combinatorial analysis. 


1.2 The Basic Principle of Counting 


The basic principle of counting will be fundamental to all our work. Loosely put, it 
states that if one experiment can result in any of m possible outcomes and if another 
experiment can result in any of n possible outcomes, then there are mn possible 
outcomes of the two experiments. 


The basic principle of counting 


Suppose that two experiments are to be performed. Then if experiment 1 can 
result in any one of m possible outcomes and if, for each outcome of 
experiment 1, there are n possible outcomes of experiment 2, then together 
there are mn possible outcomes of the two experiments. 
Proof of the Basic Principle: The basic principle may be proven by enumerating all 
the possible outcomes of the two experiments; that is, 
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C11 )e-. Cb 2)e seg. C1len} 
(2, lJ), (22) sey (A) 


(m1), (m2), .... (mn) 


where we Say that the outcome is (i, j) if experiment 1 results in its ith possible 
outcome and experiment 2 then results in its jth possible outcome. Hence, the set of 
possible outcomes consists of m rows, each containing n elements. This proves the 
result. 


Example 2a 


A small community consists of 10 women, each of whom has 3 children. If one 
woman and one of her children are to be chosen as mother and child of the year, 
how many different choices are possible? 


Solution 


By regarding the choice of the woman as the outcome of the first experiment and 
the subsequent choice of one of her children as the outcome of the second 
experiment, we see from the basic principle that there are 10 x 3 = 30 possible 
choices. 


When there are more than two experiments to be performed, the basic principle can 
be generalized. 


The generalized basic principle of counting 


If r experiments that are to be performed are such that the first one may result 
in any of n, possible outcomes; and if, for each of these n, possible outcomes, 
there are n, possible outcomes of the second experiment; and if, for each of 
the possible outcomes of the first two experiments, there are n3 possible 
outcomes of the third experiment; and if, then there is a total of n; - n2---n, 
possible outcomes of the r experiments. 

Example 2b 


A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors, 
and 2 seniors. A subcommittee of 4, consisting of 1 person from each class, is to 
be chosen. How many different subcommittees are possible? 


Solution 


We may regard the choice of a subcommittee as the combined outcome of the 
four separate experiments of choosing a single representative from each of the 
classes. It then follows from the generalized version of the basic principle that 
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there are 3 x 4x 5 x 2 = 120 possible subcommittees. 


Example 2c 

How many different 7-place license plates are possible if the first 3 places are to 
be occupied by letters and the final 4 by numbers? 

Solution 


By the generalized version of the basic principle, the answer is 
26-26-26-10-10-10-10 = 175,760,000. 


Example 2d 


How many functions defined on n points are possible if each functional value is 
either 0 or 1? 


Solution 


Let the points be 1,2, ...,n. Since f(i) must be either 0 or 1 for each i = 1,2,...,n, 
it follows that there are 2” possible functions. 


Example 2e 


In Example 2c — , how many license plates would be possible if repetition 
among letters or numbers were prohibited? 


Solution 


In this case, there would be 26: 25-24-10-9-8-7 = 78,624,000 possible 
license plates. 


1.3 Permutations 


How many different ordered arrangements of the letters a, b, and c are possible? By 
direct enumeration we see that there are 6, namely, abc, acb, bac, bca, cab, and 
cba. Each arrangement is known as a permutation. Thus, there are 6 possible 
permutations of a set of 3 objects. This result could also have been obtained from 
the basic principle, since the first object in the permutation can be any of the 3, the 
second object in the permutation can then be chosen from any of the remaining 2, 
and the third object in the permutation is then the remaining 1. Thus, there are 
3-2-1=6 possible permutations. 


Suppose now that we have n objects. Reasoning similar to that we have just 
used for the 3 letters then shows that there are 


18 of 848 


n(n—1)(n—2):3-2-1=n! 


different permutations of the n objects. 
Whereas n! (read as “n factorial”) is defined to equal 1 - 2:--n when n is a positive 
integer, it is convenient to define 0! to equal 1. 


Example 3a 


How many different batting orders are possible for a baseball team consisting of 
9 players? 


Solution 


There are 9! = 362,880 possible batting orders. 


Example 3b 


A class in probability theory consists of 6 men and 4 women. An examination is 
given, and the students are ranked according to their performance. Assume that 
no two students obtain the same score. 


a. How many different rankings are possible? 
b. If the men are ranked just among themselves and the women just among 
themselves, how many different rankings are possible? 


Solution 


a. (a) Because each ranking corresponds to a particular ordered 
arrangement of the 10 people, the answer to this part is 10! = 3,628,800. 

b. (b) Since there are 6! possible rankings of the men among themselves 
and 4! possible rankings of the women among themselves, it follows from 
the basic principle that there are (6!)(4!) = (720)(24) = 17,280 possible 
rankings in this case. 


Example 3c 


Ms. Jones has 10 books that she is going to put on her bookshelf. Of these, 4 are 
mathematics books, 3 are chemistry books, 2 are history books, and 1 is a 
language book. Ms. Jones wants to arrange her books so that all the books 
dealing with the same subject are together on the shelf. How many different 
arrangements are possible? 


Solution 


There are 4! 3! 2! 1! arrangements such that the mathematics books are first in 
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line, then the chemistry books, then the history books, and then the language 
book. Similarly, for each possible ordering of the subjects, there are 4! 3! 2! 1! 
possible arrangements. Hence, as there are 4! possible orderings of the subjects, 
the desired answer is 4!4!3!2!1! = 6912. 


We shall now determine the number of permutations of a set of n objects when 
certain of the objects are indistinguishable from one another. To set this situation 
straight in our minds, consider the following example. 


Example 3d 


How many different letter arrangements can be formed from the letters PEPPER? 


Solution 


We first note that there are 6! permutations of the letters P,E,P,P,E,R when the 
3P's and the 2E's are distinguished from one another. However, consider any one 
of these permutations for instance, P;P,E,P3E,R. lf we now permute the P’s 
among themselves and the E’s among themselves, then the resultant 
arrangement would still be of the form PPEPER. That is, all 3! 2! permutations 


P,P>E,P3E>R P,P>EP3E,R 
P,P3E,P)E>R P P3EP,E,R 
P»P,E,P3E>R P P,E>P3E,R 
PyP3E,P,E>R P P3EP,E,R 
P3P,E,P)E>R P3P,E>P,E,R 
P3P>E,P,E>R P3P>E>P,E,R 


are of the form PPEPER. Hence, there are 6!/(3! 2!) = 60 possible letter 
arrangements of the letters PEPPER. 


In general, the same reasoning as that used in Example 3d — shows that 
there are 


n! 


N,!Nz!---n;! 


different permutations of n objects, of which n, are alike, nz are alike, ... ,n, 
are alike. 
Example 3e 


A chess tournament has 10 competitors, of which 4 are Russian, 3 are from the 
United States, 2 are from Great Britain, and 1 is from Brazil. If the tournament 
result lists just the nationalities of the players in the order in which they placed, 
how many outcomes are possible? 
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Solution 
There are 


10! 
A4!3!2!1! ae 


possible outcomes. 


Example 3f 


How many different signals, each consisting of 9 flags hung in a line, can be 
made from a set of 4 white flags, 3 red flags, and 2 blue flags if all flags of the 
same color are identical? 


Solution 


There are 


9! 
Asien 


different signals. 


1.4 Combinations 


We are often interested in determining the number of different groups of r objects 
that could be formed from a total of n objects. For instance, how many different 
groups of 3 could be selected from the 5 items A, B, C, D, and E? To answer this 
question, reason as follows: Since there are 5 ways to select the initial item, 4 ways 
to then select the next item, and 3 ways to select the final item, there are thus 5 - 4-3 
ways of selecting the group of 3 when the order in which the items are selected is 
relevant. However, since every group of 3—say, the group consisting of items A, B, 
and C will be counted 6 times (that is, all of the permutations ABC, ACB, BAC, BCA, 
CAB, and CBA will be counted when the order of selection is relevant), it follows that 
the total number of groups that can be formed is 


5*4+3 
3-2-1 


= 10 


In general, as n(n — 1)---(n — r + 1) represents the number of different ways that a 
group of r items could be selected from n items when the order of selection is 
relevant, and as each group of r items will be counted r! times in this count, it follows 
that the number of different groups of r items that could be formed from a set of n 
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items is 
n(n—1)-(n-r+1) | n! 


r! ~ (n—-ry!r! 


Notation and terminology 
We define (") for r <n, by 
Tr 


n\ _ n! 
(") ~ (n-n)!r! 
n 
r 


and say that ( (read as “n choose r“) represents the number of possible 
combinations of n objects taken r at a time. 
n 
Thus, ( represents the number of different groups of size r that could be selected 
Tr 
from a set of n objects when the order of selection is not considered relevant. 
n 
Equivalently, ( is the number of subsets of size r that can be chosen from a set of 
Tr 


n n n! 
size n. Using that 0! = 1, note that ( = (5) =a 1, which is consistent with 
n In! 


the preceding interpretation because in a set of size n there is exactly 1 subset of 
size n (namely, the entire set), and exactly one subset of size 0 (namely the empty 


n 
set). A useful convention is to define ( equal to 0 when either r > norr < 0. 
Tr 


Example 4a 


A committee of 3 is to be formed from a group of 20 people. How many different 
committees are possible? 


Solution 
20 20-19-18 
There are = —~.—.— = 1140 possible committees. 
3 3-2-1 
Example 4b 


From a group of 5 women and 7 men, how many different committees consisting 
of 2 women and 3 men can be formed? What if 2 of the men are feuding and 
refuse to serve on the committee together? 


Solution 
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5 7 
As there are ()) possible groups of 2 women, and (3) possible groups of 3 


5\(7 5:4 
men, it follows from the basic principle that there are (C3) = eT 


7*6°5 
3°2:1 


= 350 possible committees consisting of 2 women and 3 men. 


Now suppose that 2 of the men refuse to serve together. Because a total of 


2\/5 7 
ale eau arie (3) = 35 possible groups of 3 men contain both of the 


feuding men, it follows that there are 35 — 5 = 30 groups that do not contain both 
5 
of the feuding men. Because there are still ()) = 10 ways to choose the 2 


women, there are 30 - 10 = 300 possible committees in this case. 


Example 4c 


Consider a set of n antennas of which m are defective and n — m are functional 
and assume that all of the defectives and all of the functionals are considered 
indistinguishable. How many linear orderings are there in which no two 
defectives are consecutive? 


Solution 


Imagine that the n — m functional antennas are lined up among themselves. Now, 
if no two defectives are to be consecutive, then the spaces between the 
functional antennas must each contain at most one defective antenna. That is, in 
the n — m+ 1 possible positions—represented in Figure 1.1 by carets— 
between the n — m functional antennas, we must select m of these in which to 


n 1 
put the defective antennas. Hence, there are ( possible orderings in 


which there is at least one functional antenna between any two defective ones. 


Figure 1.1 No consecutive defectives. 
The figure shows No consecutive defectives 


A useful combinatorial identity, known as Pascal's identity, is 


("= (222) 4(*73) tere 


Equation (4.1) | may be proved analytically or by the following combinatorial 
argument: Consider a group of n objects, and fix attention on some particular one of 


(4.1) 


n 
these objects—call it object 1. Now, there are ( i) groups of size r that contain 
T _ 


object 1 (since each such group is formed by selecting r — 1 from the remaining 


n — 1 objects). Also, there are (" groups of size r that do not contain object 1. 


As there is a total of (") groups of size r, Equation (4.1) _ follows. 
r 


n 
The values ( are often referred to as binomial coefficients because of their 
T 


prominence in the binomial theorem. 


The binomial theorem 
(4.2) 


(x+y)"= > (i) 


k =0 


We shall present two proofs of the binomial theorem. The first is a proof by 
mathematical induction, and the second is a proof based on combinatorial 
considerations. 


Proof of the Binomial Theorem by Induction: When n = 1, Equation (4.2) 


reduces to 
1 1 
xty= (;) =" + (i)e° =y+x 


Assume Equation (4.2) for n — 1. Now, 


(xty)(xty)” * 


n=1 1 
n— a ee 
= (x+y) >. ( , Jay" -- 
k =0 
Th. 1 1 1 ( ie 1 
= (", k+1..n-1-k > (", k,n—-k 
= x y + xy 
a ke 
k =0 k =0 


Letting i = k + 1 in the first sum and i = k in the second sum, we find that 


(x +y)" 
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n 
n—-1 ; n—-1 
(x+y)” = ». ( ‘ xtym i+ ». ( ; Jay 
i- i 
i= 1 i=0 
Le 1 
. n—1 LL nt n n oa Lj n-t 
= > i ayn eat ay > i xy 
i=4 C= 1 


where the next-to-last equality follows by Equation (4.1) — . By induction, the 
theorem is now proved. 


Combinatorial Proof of the Binomial Theorem: Consider the product 


(41 + ¥,)G%2 +95)" On + Y,) 


Its expansion consists of the sum of 2” terms, each term being the product of n 
factors. Furthermore, each of the 2” terms in the sum will contain as a factor either x; 
or y, for each i = 1,2,...,n. For example, 


(%1 + ¥,)(%2 +5) = 1X2 +%1¥, + Y,X2 + VY, 


Now, how many of the 2” terms in the sum will have k of the x,’s and (n — k) of the 
y, s as factors? As each term consisting of k of the x;’s and (n — k) of the y,’s 

n 
corresponds to a choice of a group of k from the n values xj, x2..., X,, there are (") 


such terms. Thus, letting x; = x,y, =y,i=1,...,n, we see that 


(x+y)"= ». (i) 


k=0 


Example 4d 


Expand (x + y)*. 


Solution 
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(x+y)? 


3 3 3 3 
O43 4 ers Payl a. 3,,0 
fo)" +(s)e9t+(a)ea" +)» 


y? + 3xy? + 3x72y 4+ x3 


Example 4e 


How many subsets are there of a set consisting of n elements? 
Solution 


n 
Since there are (7) subsets of size k, the desired answer is 


> (j)=G+nt=2" 
k =0 


This result could also have been obtained by assigning either the number O or 
the number 1 to each element in the set. To each assignment of numbers, there 
corresponds, in a one-to-one fashion, a subset, namely, that subset consisting of 
all elements that were assigned the value 1. As there are 2” possible 
assignments, the result follows. 


Note that we have included the set consisting of 0 elements (that is, the null set) 
as a subset of the original set. Hence, the number of subsets that contain at least 
1 element is 2” — 1. 


1.5 Multinomial Coefficients 


In this section, we consider the following problem: A set of n distinct items is to be 
Tr 

divided into r distinct groups of respective sizes n,n, ...,n,, where >. ny, = Nn. 
i=1 

How many different divisions are possible? To answer this question, we note that 


n 
there are ( 
n 


possible choices for the first group; for each choice of the first group, 
1 


n-n, 


there are ( 
N2 


possible choices for the second group; for each choice of the first 
n-ny-n 


two groups, there are ( ‘ possible choices for the third group; and so on. 


N3 
It then follows from the generalized version of the basic counting principle that there 
are 
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n! (n—-n,)! (n-—n,—Nz-—-'—N,--1)! 
~ (n—n) In! (n—n,—N)!ny! O!n,! 

n!| 
~ nN !nz!--n,! 


possible divisions. 


Another way to see this result is to consider the n values 1,1,...,1,2,...,2,..., 
r,...,r, where i appears n, times, for i = 1, ... ,r. Every permutation of these values 
corresponds to a division of the n items into the r groups in the following manner: Let 
the permutation i,,i2, ... ,i, correspond to assigning item 1 to group i, item 2 to 
group i,, and so on. For instance, if n = 8 and ifn, = 4,n, = 3,andn; = 1, then the 
permutation 1,1, 2,3, 2,1, 2, 1 corresponds to assigning items 1, 2, 6,8 to the first 
group, items 3,5, 7 to the second group, and item 4 to the third group. Because every 
permutation yields a division of the items and every possible division results from 
some permutation, it follows that the number of divisions of n items into r distinct 
groups of sizes n,n, ... ,n, is the same as the number of permutations of n items of 


which n, are alike, and n, are alike, ..., and n, are alike, which was shown in 
n! 


Section1.3 to equal —_——_—_. 
nN, !n2!---n,! 


Notation 


n 
lfny +n, ++-+n, =n, we define ( ) by 
Zt Dr ss Ny 


n n! 
Ny ,Nz, «++ Np N,!Nz!---n,;! 


n 
represents the number of possible divisions of n distinct 
US 


Thus, ( 
N4,N2z,....N 
objects into r distinct groups of respective sizes n,,nz,... n,- 


Example 5a 


A police department in a small city consists of 10 officers. If the department 
policy is to have 5 of the officers patrolling the streets, 2 of the officers working 
full time at the station, and 3 of the officers on reserve at the station, how many 
different divisions of the 10 officers into the 3 groups are possible? 


Solution 
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10! 
There are 512131 


= 2520 possible divisions. 


Example 5b 


Ten children are to be divided into an A team and a B team of 5 each. The A 
team will play in one league and the B team in another. How many different 
divisions are possible? 


Solution 


10! 
There are SIsI > 252 possible divisions. 


Example 5c 


In order to play a game of basketball, 10 children at a playground divide 
themselves into two teams of 5 each. How many different divisions are possible? 


Solution 


Note that this example is different from Example 5b because now the order of 
the two teams is irrelevant. That is, there is no A or B team, but just a division 
consisting of 2 groups of 5 each. Hence, the desired answer is 


10!/(5!5!) 


7 = 126 


The proof of the following theorem, which generalizes the binomial theorem, is left as 
an exercise. 


The multinomial theorem 


(xy tx. te +x,)" = 


n 
ny, .n n 
> pias eines ad 
Nii Moya « dg My 
(my. my): 
nyt +n, =n 


That is, the sum is over all nonnegative integer-valued vectors (n;,N2, ... N,) 
such thatn,; +n, ++ +n, =n. 


The numbers ( are known as multinomial coefficients. 


UG MSs van ep 


Example 5d 


In the first round of a knockout tournament involving n = 2™ players, the n 
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players are divided into n/2 pairs, with each of these pairs then playing a game. 
The losers of the games are eliminated while the winners go on to the next 
round, where the process is repeated until only a single player remains. Suppose 
we have a knockout tournament of 8 players. 


a. How many possible outcomes are there for the initial round? (For 
instance, one outcome is that 1 beats 2, 3 beats 4, 5 beats 6, and 7 beats 
8.) 

b. How many outcomes of the tournament are possible, where an outcome 
gives complete information for all rounds? 


Solution 


One way to determine the number of possible outcomes for the initial round is to 
first determine the number of possible pairings for that round. To do so, note that 
the number of ways to divide the 8 players into a first pair, a second pair, a third 


8! 
pair, and a fourth pair is ( = 5 Thus, the number of possible pairings 


Zo, ) 


when there is no ordering of the 4 pairs is 


For each such pairing, there are 


2741 
2 possible choices from each pair as to the winner of that game, showing that 
PAgnme 3) 
there are me ve possible results of round 1. [Another way to see this is to 


8 
note that there are 8 possible choices of the 4 winners and, for each such 


choice, there are 4! ways to pair the 4 winners with the 4 losers, showing that 


8 8! 
there are (4) =a possible results for the first round. ] 


4! 
Similarly, for each result of round 1, there are Ti possible outcomes of round 2, 


2! 
and for each of the outcomes of the first two rounds, there are Ti possible 


outcomes of round 3. Consequently, by the generalized basic principle of 

8! 4! 2! 
counting, there are rca 8! possible outcomes of the tournament. Indeed, 
the same argument can be used to show that a knockout tournament of n = 2” 


players has n! possible outcomes. 


Knowing the preceding result, it is not difficult to come up with a more direct 
argument by showing that there is a one-to-one correspondence between the set 
of possible tournament results and the set of permutations of 1, ... ,n. To obtain 
such a correspondence, rank the players as follows for any tournament result: 
Give the tournament winner rank 1, and give the final-round loser rank 2. For the 
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two players who lost in the next-to-last round, give rank 3 to the one who lost to 
the player ranked 1 and give rank 4 to the one who lost to the player ranked 2. 
For the four players who lost in the second-to-last round, give rank 5 to the one 
who lost to player ranked 1, rank 6 to the one who lost to the player ranked 2, 
rank 7 to the one who lost to the player ranked 3, and rank 8 to the one who lost 
to the player ranked 4. Continuing on in this manner gives a rank to each player. 
(A more succinct description is to give the winner of the tournament rank 1 and 
let the rank of a player who lost in a round having 2“ matches be 2* plus the rank 
of the player who beat him, for k = 0, ...,m-— 1.) In this manner, the result of the 
tournament can be represented by a permutation i,,i2, ...,i;, where i; is the 
player who was given rank j. Because different tournament results give rise to 
different permutations, and because there is a tournament result for each 
permutation, it follows that there are the same number of possible tournament 
results as there are permutations of 1,... ,n. 


Example 5e 


(%, + x2 + 2) 


2 2 
e 0, , x2 x9x9 + a 2 >) aware at 
2 2 
+ xoxox? + xxix? 
eae) Oy 2x (to) des 


2 1,,0,.1 2 
+ 10,1 X{XpX3 + 0,41 X4xX2X3 


= x2 4x2 4x2 4+ 2x4x2 +. 2x1x3 + 2x x3 


* 1.6 The Number of Integer Solutions of 
Equations 


* Asterisks denote material that is optional. 


An individual has gone fishing at Lake Ticonderoga, which contains four types of fish: 
lake trout, catfish, bass, and bluefish. If we take the result of the fishing trip to be the 
numbers of each type of fish caught, let us determine the number of possible 
outcomes when a total of 10 fish are caught. To do so, note that we can denote the 
outcome of the fishing trip by the vector (x1, x2, x3, X4) where x, is the number of 
trout that are caught, x, is the number of catfish, x3 is the number of bass, and x, is 
the number of bluefish. Thus, the number of possible outcomes when a total of 10 
fish are caught is the number of nonnegative integer vectors (x1, x2, x3, x4) that sum 
to 10. 
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More generally, if we supposed there were r types of fish and that a total of n were 
caught, then the number of possible outcomes would be the number of nonnegative 
integer-valued vectors x,,...,x, Such that 


(6.1) 


Xp tXgt... +X, Sn 


To compute this number, let us start by considering the number of positive integer- 
valued vectors x,, ... ,x, that satisfy the preceding. To determine this number, 
suppose that we have n consecutive zeroes lined up in a row: 


000... 00 
Note that any selection of r — 1 of the n — 1 spaces between adjacent zeroes (see 
Figure 1.2 _) corresponds to a positive solution of 6.1 —_ by letting x, be the 
number of zeroes before the first chosen space, x. be the number of zeroes between 


the first and second chosen space, ..., and x, being the number of zeroes following 
the last chosen space. 


Figure 1.2 Number of positive solutions. 


OvnQVaA0aA...0a0 a0 


n objects 0 


Choose r — 1 of the spaces a. 


For instance, if we have n = 8 and r = 3, then (with the choices represented by dots) 
the choice 


0.0000.000 


corresponds to the solution x, = 1,x, = 4,x3 = 3. As positive solutions of (6.1) 
correspond, in a one-to-one fashion, to choices of r — 1 of the adjacent spaces, it 
follows that the number of differerent positive solutions is equal to the number of 
different selections of r — 1 of the n — 1 adjacent spaces. Consequently, we have the 
following proposition. 


Proposition 6.1 
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n 
There are ( distinct positive integer-valued vectors (x, X2,... ,X,) 


satisfying the equation 
Xp txXg ter tx, Hn, x, >0, t=1,...,7 
To obtain the number of nonnegative (as opposed to positive) solutions, note that the 
number of nonnegative solutions of x, + x, +--+ x; = nis the same as the number 


of positive solutions of y, +--+ y, =n +r (seen by letting y, = x; + 1,i=1,...,r). 
Hence, from Proposition 6.1 __, we obtain the following proposition. 


Proposition 6.2 


(a at 
There are ( distinct nonnegative integer-valued vectors (x1,X2, ...,X,) 
satisfying the equation 


Xp tXgte4+xX, =n 


13 
Thus, using Proposition 6.2 _, we see that there are ( 5 = 286 possible 


outcomes when a total of 10 Lake Ticonderoga fish are caught. 


Example 6a 


How many distinct nonnegative integer-valued solutions of x, + x, = 3 are 
possible? 


Solution 


3+2-1 


There are 
2-1 


= 4 such solutions: (0, 3), (1, 2), (2, 1), (3, 0). 


Example 6b 


An investor has $20,000 to invest among 4 possible investments. Each 
investment must be in units of $1000. If the total $20,000 is to be invested, how 
many different investment strategies are possible? What if not all the money 
needs to be invested? 


Solution 


If we let x;,i = 1, 2, 3, 4, denote the number of thousands invested in investment 
i, then, when all is to be invested, x,, x2, x3, x, are integers satisfying the 
equation 
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Xy +X, 4+%X34+%X,=20 x, =0 


23 
Hence, by Proposition 6.2 __, there are ( 3 = 1771 possible investment 


strategies. If not all of the money needs to be invested, then if we let x, denote 
the amount kept in reserve, a strategy is a nonnegative integer-valued vector 
(X1, Xz, X3, X4, Xs) Satisfying the equation 


XytX2+X%34+%4,4+x5 = 20 
me 24 ' 
Hence, by Proposition 6.2 __, there are now = 10,626 possible strategies. 


Example 6c 


How many terms are there in the multinomial expansion of (x; + x2 ++ +x,)"? 


Solution 
(Xp +X2 +++ x,)" = ». - 5 aan al 
1 2 r Riteccdtle 1 r 


where the sum is over all nonnegative integer-valued (n,, ...,n,) such that 
n+r-—-1 
n, ++::+n, =n. Hence, by Proposition 6.2 __, there are ( ; such 
T _ 


terms. 


Example 6d 


Let us consider again Example 4c __, in which we have a set of n items, of 
which m are (indistinguishable and) defective and the remaining n — m are (also 
indistinguishable and) functional. Our objective is to determine the number of 
linear orderings in which no two defectives are next to each other. To determine 
this number, let us imagine that the defective items are lined up among 
themselves and the functional ones are now to be put in position. Let us denote 
x, as the number of functional items to the left of the first defective, x, as the 
number of functional items between the first two defectives, and so on. That is, 
schematically, we have 


X40 X2 OX, OX 44 
Now, there will be at least one functional item between any pair of defectives as 


long as x; > 0,i = 2,...,m. Hence, the number of outcomes satisfying the 
condition is the number of vectors xj, ...,x,4+, that satisfy the equation 
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Xp ter txXmiy =N-M X, 20, X41 20,%; > 0,8 = 2,...,m 


But, on letting y, =x, + Ly, =x, i= 2,...,MY,.41, =%Xm+1 + 1, we see that this 
number is equal to the number of positive vectors (y,,...,y,,,,,) that satisfy the 
equation 


Vg ear gy SS 


n 1 
Hence, by Proposition 6.1 __, there are ( such outcomes, in 


agreement with the results of Example 4c 


Suppose now that we are interested in the number of outcomes in which each 
pair of defective items is separated by at least 2 functional items. By the same 
reasoning as that applied previously, this would equal the number of vectors 
satisfying the equation 


Xp ter tXxXmiy H=N-M, X, 20, X41 20,%; = 2,i=2,...,m 


Upon letting y, =x, +1y,=x%,-Li=2,....my = Xm+1+ 1, we see that 


mt+1 
this is the same as the number of positive solutions of the equation 


yy Peery, =n ims 3 


ee n—-2m+2 
Hence, from Proposition 6.1 __, there are ( such outcomes. 
m 


Summary 


The basic principle of counting states that if an experiment consisting of two phases 
is such that there are n possible outcomes of phase 1 and, for each of these n 
outcomes, there are m possible outcomes of phase 2, then there are nm possible 
outcomes of the experiment. 


There are n! = n(n — 1)::-3- 2-1 possible linear orderings of n items. The quantity 0! 
is defined to equal 1. 


Let 
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when 0 <i <n, and let it equal 0 otherwise. This quantity represents the number of 
different subgroups of size i that can be chosen from a set of size n. It is often called 
a binomial coefficient because of its prominence in the binomial theorem, which 
states that 


n 


(x+y)"= ». (Fay 


i=0 


For nonnegative integers nj, ...,n, Summing to n, 


n n! 
N4,Nz,. . .,Ny nN, !nNz!+--n,! 


is the number of divisions of n items into r distinct nonoverlapping subgroups of 
SIZES N,,Nz ... N,. These quantities are called multinomial coefficients. 


Problems 


a. How many different 7-place license plates are possible if the first 
2 places are for letters and the other 5 for numbers? 

b. Repeat part (a) under the assumption that no letter or number 
can be repeated in a single license plate. 


2. How many outcome sequences are possible when a die is rolled four 
times, where we Say, for instance, that the outcome is 3, 4, 3, 1 if the 
first roll landed on 3, the second on 4, the third on 3, and the fourth on 
1? 

3. Twenty workers are to be assigned to 20 different jobs, one to each 
job. How many different assignments are possible? 

4. John, Jim, Jay, and Jack have formed a band consisting of 4 
instruments. If each of the boys can play all 4 instruments, how many 
different arrangements are possible? What if John and Jim can play all 
4 instruments, but Jay and Jack can each play only piano and drums? 
5. For years, telephone area codes in the United States and Canada 
consisted of a sequence of three digits. The first digit was an integer 
between 2 and 9, the second digit was either 0 or 1, and the third digit 
was any integer from 1 to 9. How many area codes were possible? 
How many area codes starting with a 4 were possible? 

6. A well-known nursery rhyme starts as follows: 
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“As | was going to St. Ives 

| met a man with 7 wives. 

Each wife had 7 sacks. 

Each sack had 7 cats. 

Each cat had 7 kittens...” 

How many kittens did the traveler meet? 


a. In how many ways can 3 boys and 3 girls sit in a row? 

b. In how many ways can 3 boys and 3 girls sit in a row if the boys 
and the girls are each to sit together? 

c. In how many ways if only the boys must sit together? 

d. In how many ways if no two people of the same sex are allowed 
to sit together? 


8. When all letters are used, how many different letter arrangements 
can be made from the letters 

a. Fluke? 

b. Propose? 

c. Mississippi? 

d. Arrange? 


9. Achild has 12 blocks, of which 6 are black, 4 are red, 1 is white, and 
1 is blue. If the child puts the blocks in a line, how many arrangements 
are possible? 
10. In how many ways can 8 people be seated in a row if 

a. there are no restrictions on the seating arrangement? 

b. persons A and B must sit next to each other? 

c. there are 4 men and 4 women and no 2 men or 2 women can sit 

next to each other? 
d. there are 5 men and they must sit next to one another? 
e. there are 4 married couples and each couple must sit together? 


11. In how many ways can 3 novels, 2 mathematics books, and 1 
chemistry book be arranged on a bookshelf if 
a. the books can be arranged in any order? 
b. the mathematics books must be together and the novels must 
be together? 
c. the novels must be together, but the other books can be 
arranged in any order? 
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12. How many 3 digit numbers xyz, with x, y, z all ranging from 0 to 9 
have at least 2 of their digits equal. How many have exactly 2 equal 
digits. 
13. How many different letter permutations, of any length, can be made 
using the letters M OT T O. (For instance, there are 3 possible 
permutations of length 1.) 
14. Five separate awards (best scholarship, best leadership qualities, 
and so on) are to be presented to selected students from a class of 30. 
How many different outcomes are possible if 

a. a Student can receive any number of awards? 

b. each student can receive at most 1 award? 


15. Consider a group of 20 people. If everyone shakes hands with 
everyone else, how many handshakes take place? 
16. How many 5-card poker hands are there? 
17. A dance class consists of 22 students, of which 10 are women and 
12 are men. If 5 men and 5 women are to be chosen and then paired 
off, how many results are possible? 
18. A student has to sell 2 books from a collection of 6 math, 7 science, 
and 4 economics books. How many choices are possible if 

a. both books are to be on the same subject? 

b. the books are to be on different subjects? 


19. Seven different gifts are to be distributed among 10 children. How 
many distinct results are possible if no child is to receive more than one 
gift? 
20. A committee of 7, consisting of 2 Republicans, 2 Democrats, and 3 
Independents, is to be chosen from a group of 5 Republicans, 6 
Democrats, and 4 Independents. How many committees are possible? 
21. From a group of 8 women and 6 men, a committee consisting of 3 
men and 3 women is to be formed. How many different committees are 
possible if 

a. 2 of the men refuse to serve together? 

b. 2 of the women refuse to serve together? 

c. 1 man and 1 woman refuse to serve together? 


22. A person has 8 friends, of whom 5 will be invited to a party. 
a. How many choices are there if 2 of the friends are feuding and 
will not attend together? 
b. How many choices if 2 of the friends will only attend together? 


23. Consider the grid of points shown at the top of the next column. 
Suppose that, starting at the point labeled A, you can go one step up or 
one step to the right at each move. This procedure is continued until 
the point labeled B is reached. How many different paths from A to B 
are possible? 

Hint: Note that to reach B from A, you must take 4 steps to the right 


and 3 steps upward. 
B 


A 


24. In Problem 23, how many different paths are there from A to B 


that go through the point circled in the following lattice? 
B 


A 


25. A psychology laboratory conducting dream research contains 3 
rooms, with 2 beds in each room. If 3 sets of identical twins are to be 
assigned to these 6 beds so that each set of twins sleeps in different 
beds in the same room, how many assignments are possible? 

26. 
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27. Expand (3x? + y)°. 
28. The game of bridge is played by 4 players, each of whom is dealt 
13 cards. How many bridge deals are possible? 
29. Expand (x, + 2x, + 3x3)*. 
30. If 12 people are to be divided into 3 committees of respective sizes 
3, 4, and 5, how many divisions are possible? 
31. If 8 new teachers are to be divided among 4 schools, how many 
divisions are possible? What if each school must receive 2 teachers? 
32. Ten weight lifters are competing in a team weight-lifting contest. Of 
the lifters, 3 are from the United States, 4 are from Russia, 2 are from 
China, and 1 is from Canada. If the scoring takes account of the 
countries that the lifters represent, but not their individual identities, 
how many different outcomes are possible from the point of view of 
scores? How many different outcomes correspond to results in which 
the United States has 1 competitor in the top three and 2 in the bottom 
three? 
33. Delegates from 10 countries, including Russia, France, England, 
and the United States, are to be seated in a row. How many different 
seating arrangements are possible if the French and English delegates 
are to be seated next to each other and the Russian and U.S. 
delegates are not to be next to each other? 
* 34. If 8 identical blackboards are to be divided among 4 schools, how 
many divisions are possible? How many if each school must receive at 
least 1 blackboard? 
* 35. An elevator starts at the basement with 8 people (not including the 
elevator operator) and discharges them all by the time it reaches the 
top floor, number 6. In how many ways could the operator have 
perceived the people leaving the elevator if all people look alike to him? 
What if the 8 people consisted of 5 men and 3 women and the operator 
could tell a man from a woman? 
* 36. We have $20,000 that must be invested among 4 possible 
opportunities. Each investment must be integral in units of $1000, and 
there are minimal investments that need to be made if one is to invest 
in these opportunities. The minimal investments are $2000, $2000, 
$3000, and $4000. How many different investment strategies are 
available if 

a. an investment must be made in each opportunity? 

b. investments must be made in at least 3 of the 4 opportunities? 


* 37. Suppose that 10 fish are caught at a lake that contains 5 distinct 
types of fish. 
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a. How many different outcomes are possible, where an outcome 
specifies the numbers of caught fish of each of the 5 types? 

b. How many outcomes are possible when 3 of the 10 fish caught 
are trout? 

c. How many when at least 2 of the 10 are trout? 


Theoretical Exercises 


1. Prove the generalized version of the basic counting principle. 

2. Two experiments are to be performed. The first can result in any one of m 
possible outcomes. If the first experiment results in outcome i, then the 
second experiment can result in any of n; possible outcomes, i = 1,2, ...,m. 
What is the number of possible outcomes of the two experiments? 

3. In how many ways can r objects be selected from a set of n objects if the 
order of selection is considered relevant? 


4. There are (") different linear arrangements of n balls of which r are black 
r 


and n — r are white. Give a combinatorial explanation of this fact. 
5. Determine the number of vectors (x,,...,x,,), such that each x; is either 0 or 
1 and 


6. How many vectors x1, ...,%, are there for which each x; is a positive integer 
such that1 <x; <nand x, <x, < + <Xx,? 
7. Give an analytic proof of Equation (4.1) 


CY) =O) 
vee (*() 


Hint: Consider a group of n men and m women. How many groups of size r 
are possible? 
9. Use Theoretical Exercise 8 __ to prove that 


(“)- > G) 


10. From a group of n people, suppose that we want to choose a committee of 
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k,k < n, one of whom is to be designated as chairperson. 
a. By focusing first on the choice of the committee and then on the choice 


n 
of the chair, argue that there are (7) k possible choices. 
b. By focusing first on the choice of the nonchair committee members and 


n 
then on the choice of the chair, argue that there are é, ) (n—k+1) 


possible choices. 
c. By focusing first on the choice of the chair and then on the choice of 


ibl 
4 possible 


n 
the other committee members, argue that there are n i 


choices. 
d. Conclude from parts (a), (b), and (c) that 


Hoon etol. 2 )oa(ica) 


e. Use the factorial definition of (") to verify the identity in part (d). 
r 


11. The following identity is known as Fermat’s combinatorial identity: 


Give a combinatorial argument (no computations are needed) to establish this 
identity. 

Hint: Consider the set of numbers 1 through n. How many subsets of size k 
have i as their highest numbered member? 

12. Consider the following combinatorial identity: 


> e(g) aaa 


a. Present a combinatorial argument for this identity by considering a set 
of n people and determining, in two ways, the number of possible 
selections of a committee of any size and a chairperson for the 
committee. 

Hint: 
i. How many possible selections are there of a committee of size k 
and its chairperson? 
ii. How many possible selections are there of a chairperson and 
the other committee members? 


b. Verify the following identity for n = 1, 2, 3, 4, 5: 
> (*) k? = 2"-2n(n +1) 


For a combinatorial proof of the preceding, consider a set of n people 
and argue that both sides of the identity represent the number of 
different selections of a committee, its chairperson, and its secretary 
(possibly the same as the chairperson). 
Hint: 
i. How many different selections result in the committee containing 
exactly k people? 
ii. How many different selections are there in which the 
chairperson and the secretary are the same? (ANSWER: n2”_*. 
) 
iii. How many different selections result in the chairperson and the 
secretary being different? 


c. Now argue that 
n 
n 
» (*) k3 = 2"-3n2(n + 3) 
k=1 


13. Show that, for n > 0, 


Hint: Use the binomial theorem. 
14. From a set of n people, a committee of size j is to be chosen, and from 
this committee, a subcommittee of size i,i < j, is also to be chosen. 

a. Derive a combinatorial identity by computing, in two ways, the number 
of possible choices of the committee and subcommittee—first by 
supposing that the committee is chosen first and then the 
subcommittee is chosen, and second by supposing that the 
subcommittee is chosen first and then the remaining members of the 
committee are chosen. 

b. Use part (a) to prove the following combinatorial identity: 


¥ (a (ert tse 


Jt 


c. Use part (a) and Theoretical Exercise 13 _ to show that 
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> Ceevr t= i<n 
Ahi 
joi 


15. Let H;,(n) be the number of vectors x,, ... x, for which each x; is a 
positive integer satisfying 1 <x; <nand x, <x, <°: < xx. 
a. Without any computations, argue that 
H,(n) = n 


He) = ) HeaG) k>1 
j=1 


Hint: How many vectors are there in which x, = j? 
b. Use the preceding recursion to compute H3(5). 
Hint: First compute H,(n) for n = 1, 2, 3, 4, 5. 


16. Consider a tournament of n contestants in which the outcome is an 
ordering of these contestants, with ties allowed. That is, the outcome 
partitions the players into groups, with the first group consisting of the players 
who tied for first place, the next group being those who tied for the next-best 
position, and so on. Let N(n) denote the number of different possible 
outcomes. For instance, N(2) = 3, since, in a tournament with 2 contestants, 
player 1 could be uniquely first, player 2 could be uniquely first, or they could 
tie for first 

a. List all the possible outcomes when n = 3. 

b. With N(0) defined to equal 1, argue, without any computations, that 


n 


N(n) = > (") N(n—1) 


i=1 


Hint: How many outcomes are there in which i players tie for last 
place? 
c. Show that the formula of part (b) is equivalent to the following: 


n—i1 


N= > (") No 


i=0 


d. Use the recursion to find N(3) and N(4). 


n n 
17. Present a combinatorial explanation of why ( = ( 
r GUST 


18. Argue that 


n n—-1 
(ns. : ” ~ been : de 
n-1 
(nom. ‘ in) t 


n-1 
+ 
N4,N2,...,Ny—1 


Hint: Use an argument similar to the one used to establish Equation (4.1) 
19. Prove the multinomial theorem. 

* 20. In how many ways can n identical balls be distributed into r urns so that 
the ith urn contains at least m; balls, for each i = 1, ... ,,? Assume that 


a 
n2 ». mj. 
i=1 


Tr n— 
* 21. Argue that there are exactly (7)( 


1 
) solutions of 


My tXg te tx, =n 


for which exactly k of the x; are equal to 0. 

* 22. Consider a function f(xy, ... Xn) of n variables. How many different 
partial derivatives of order r does f possess? 

* 23. Determine the nu mber of vectors (x,, ...,%,) such that each x; is a 
nonnegative integer and 


Self-Test Problems and Exercises 


1. How many different linear arrangements are there of the letters A, B, C, D, 
E, F for which 

a. A and B are next to each other? 

b. A is before B? 

c. Ais before B and B is before C? 

d. Ais before B and C is before D? 

e. Aand B are next to each other and C and D are also next to each 

other? 
f. E is not last in line? 


2. If 4 Americans, 3 French people, and 3 British people are to be seated ina 


row, how many seating arrangements are possible when people of the same 
nationality must sit next to each other? 
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3. A president, treasurer, and secretary, all different, are to be chosen from a 
club consisting of 10 people. How many different choices of officers are 
possible if 

a. there are no restrictions? 

b. A and B will not serve together? 

c. C and D will serve together or not at all? 

d. E must be an officer? 

e. F will serve only if he is president? 


4. A student is to answer 7 out of 10 questions in an examination. How many 
choices has she? How many if she must answer at least 3 of the first 5 
questions? 

5. In how many ways can a man divide 7 gifts among his 3 children if the 
eldest is to receive 3 gifts and the others 2 each? 

6. How many different 7-place license plates are possible when 3 of the 
entries are letters and 4 are digits? Assume that repetition of letters and 
numbers is allowed and that there is no restriction on where the letters or 
numbers can be placed. 

7. Give a combinatorial explanation of the identity 


8. Consider n-digit numbers where each digit is one of the 10 integers 
0,1, ...,9. How many such numbers are there for which 

a. no two consecutive digits are equal? 

b. 0 appears as a digit a total of i times, i = 0, ... ,n? 


9. Consider three classes, each consisting of n students. From this group of 3 
n students, a group of 3 students is to be chosen. 
a. How many choices are possible? 
b. How many choices are there in which all 3 students are in the same 
class? 
c. How many choices are there in which 2 of the 3 students are in the 
same class and the other student is in a different class? 
d. How many choices are there in which all 3 students are in different 
classes? 
e. Using the results of parts (a) through (d), write a combinatorial identity. 


10. How many 5-digit numbers can be formed from the integers 1,2, ...,9 if no 


digit can appear more than twice? (For instance, 41434 is not allowed.) 
11. From 10 married couples, we want to select a group of 6 people that is not 
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allowed to contain a married couple. 
a. How many choices are there? 
b. How many choices are there if the group must also consist of 3 men 
and 3 women? 


12. A committee of 6 people is to be chosen from a group consisting of 7 men 
and 8 women. If the committee must consist of at least 3 women and at least 
2 men, how many different committees are possible? 

* 13. An art collection on auction consisted of 4 Dalis, 5 van Goghs, and 6 
Picassos. At the auction were 5 art collectors. If a reporter noted only the 
number of Dalis, van Goghs, and Picassos acquired by each collector, how 
many different results could have been recorded if all of the works were sold? 
* 14. Determine the number of vectors (x,,...,%,,) such that each x; is a 
positive integer and 


where k => n. 

15. A total of n students are enrolled in a review course for the actuarial 
examination in probability. The posted results of the examination will list the 
names of those who passed, in decreasing order of their scores. For instance, 
the posted result will be “Brown, Cho’ if Brown and Cho are the only ones to 
pass, with Brown receiving the higher score. Assuming that all scores are 
distinct (no ties), how many posted results are possible? 

16. How many subsets of size 4 of the set S = {1, 2, ... ,20} contain at least 
one of the elements 1, 2,3, 4,5? 

17. Give an analytic verification of 


(3)=(2) +e@m—- + ("5") 1<k<n 


Now, give a combinatorial argument for this identity. 

18. In a certain community, there are 3 families consisting of a single parent 
and 1 child, 3 families consisting of a single parent and 2 children, 5 families 
consisting of 2 parents and a single child, 7 families consisting of 2 parents 
and 2 children, and 6 families consisting of 2 parents and 3 children. lf a 
parent and child from the same family are to be chosen, how many possible 
choices are there? 

19. If there are no restrictions on where the digits and letters are placed, how 
many 8-place license plates consisting of 5 letters and 3 digits are possible if 
no repetitions of letters or digits are allowed? What if the 3 digits must be 
consecutive? 


20. Verify the identity 
x,t... tx, =n,x;, 20 


a. by a combinatorial argument that first notes that r” is the number of 
different n letter sequences that can be formed from an alphabet 
consisting of r letters, and then determines how many of these letter 
sequences have letter 1 a total of x, times and letter 2 a total of x, 
times and ... and letter r a total of x,. times; 

b. by using the multinomial theorem. 


21. Simplify n — (") a (") 5 lt co (") 


Chapter 2 Axioms of Probability 


Contents 


2.1 Introduction 

2.2 Sample Space and Events 

2.3 Axioms of Probability 

2.4 Some Simple Propositions 

2.5 Sample Spaces Having Equally Likely Outcomes 
2.6 Probability as a Continuous Set Function 


2.7 Probability as a Measure of Belief 


2.1 Introduction 


In this chapter, we introduce the concept of the probability of an event and then show 
how probabilities can be computed in certain situations. As a preliminary, however, 
we need to discuss the concept of the sample space and the events of an 
experiment. 
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2.2 Sample Space and Events 


Consider an experiment whose outcome is not predictable with certainty. However, 
although the outcome of the experiment will not be known in advance, let us suppose 
that the set of all possible outcomes is known. This set of all possible outcomes of an 
experiment is known as the sample space of the experiment and is denoted by S. 
Following are some examples: 


1. If the outcome of an experiment consists of the determination of the sex of a 
newborn child, then 
S = {g,b} 


where the outcome g means that the child is a girl and b that it is a boy. 
2. If the outcome of an experiment is the order of finish in a race among the 7 
horses having post positions 1, 2, 3, 4, 5, 6, and 7, then 
S = {all 7! permutations of (1, 2, 3, 4, 5, 6, 7)} 


The outcome (2, 3, 1, 6, 5, 4, 7) means, for instance, that the number 2 horse 
comes in first, then the number 3 horse, then the number 1 horse, and so on. 

3. If the experiment consists of flipping two coins, then the sample space 
consists of the following four points: 


S={(h h), (ht), (h), (605 


The outcome will be (h, h) if both coins are heads, (jh, t) if the first coin is 
heads and the second tails, (t, h) if the first is tails and the second heads, and 
(t, t) if both coins are tails. 
4. lf the experiment consists of tossing two dice, then the sample space consists 
of the 36 points 
S=i{(jJ) 47 =1,2,3,4 5,6} 


where the outcome (i, j) is said to occur if i appears on the leftmost die and j 
on the other die. 
5. If the experiment consists of measuring (in hours) the lifetime of a transistor, 
then the sample space consists of all nonnegative real numbers; that is, 
S = {x:0 <x < oo} 


Any subset E of the sample space is known as an event. In other words, an event is 
a set consisting of possible outcomes of the experiment. If the outcome of the 
experiment is contained in E, then we say that F has occurred. Following are some 
examples of events. 
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In the preceding Example 1, if E = {g}, then E is the event that the child is a girl. 
Similarly, if F = {b}, then F is the event that the child is a boy. 


In Example 2, if 


E = {all outcomes in S starting with a 3} 


then E is the event that horse 3 wins the race. 


In Example 3, if E = {(h, h), (h, t)}, then E is the event that a head appears on the 
first coin. 


In Example 4, if E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, then E is the event that 
the sum of the dice equals 7. 


In Example 5, if EF = {x: 0 < x < 5}, then E is the event that the transistor does not 
last longer than 5 hours. 


For any two events E and F of a sample space S, we define the new event E U F to 
consist of all outcomes that are either in E or in F or in both E and F. That is, the 
event E U F will occur if either E or F occurs. For instance, in Example 1, if E = {g} is 
the event that the child is a girl and F = {b} is the event that the child is a boy, then 


EUF = {g,b} 


is the whole sample space S. In Example 3, if E = {(h, h), (h, t)} is the event that the 
first coin lands heads, and F = {(t, h), (h, h)} is the event that the second coin lands 
heads, then 


EUF={(h,h), (A, 6), (t, hy} 


is the event that at least one of the coins lands heads and thus will occur provided 
that both coins do not land tails. 


The event E U F is called the union of the event E and the event F. 


Similarly, for any two events E and F, we may also define the new event EF, called 
the intersection of E and F, to consist of all outcomes that are both in E and in F. That 
is, the event EF (sometimes written EM F) will occur only if both E and F occur. For 
instance, in Example 3, if E = {(h, h), (h, t), (¢, h)} is the event that at least 1 head 
occurs and F = {(h,t), (t, h), (t, t)} is the event that at least 1 tail occurs, then 


EF = {(h,0), (t,h)} 
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is the event that exactly 1 head and 1 tail occur. In Example 4, if 

E = {(1, 6), (2,5), (3, 4), (4, 3), (5, 2), (6, 1)} is the event that the sum of the dice is 7 
and F = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} is the event that the sum is 6, then the 
event EF does not contain any outcomes and hence could not occur. To give such an 
event a name, we shall refer to it as the null event and denote it by @. (That is, @ 
refers to the event consisting of no outcomes.) If EF = @, then E and F are said to be 
mutually exclusive. 


We define unions and intersections of more than two events in a similar manner. If 


[o-e) 


E,, E2,... are events, then the union of these events, denoted by U_ E,, is defined 
n= 1 


to be that event that consists of all outcomes that are in E,, for at least one value of 


n= 1,2,... Similarly, the intersection of the events E,, denotedby fA E,, is 
n=1 


defined to be the event consisting of those outcomes that are in all of the events 
E,n=1,2,... 


Finally, for any event £, we define the new event EF‘, referred to as the complement of 
E, to consist of all outcomes in the sample space S that are not in E. That is, E* will 
occur if and only if E does not occur. In Example 4, if event 

E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, then E° will occur when the sum of the 
dice does not equal 7. Note that because the experiment must result in some 
outcome, it follows that S° = @. 


For any two events E and F, if all of the outcomes in E are also in F, then we say that 
E is contained in F, or E is a subset of F, and write E c F (or equivalently, F > E, 
which we sometimes say as F is a superset of E). Thus, if E c F, then the occurrence 
of E implies the occurrence of F. If E Cc F and F c E, we say that E and F are equal 
and write E = F. 


A graphical representation that is useful for illustrating logical relations among events 
is the Venn diagram. The sample space S is represented as consisting of all the 
outcomes in a large rectangle, and the events E, F, G,... are represented as 
consisting of all the outcomes in given circles within the rectangle. Events of interest 
can then be indicated by shading appropriate regions of the diagram. For instance, in 
the three Venn diagrams shown in Figure 2.1 __, the shaded areas represent, 
respectively, the events E U F, EF, and E‘°. The Venn diagram in Figure 2.2 

indicates that E c F. 


Figure 2.1 Venn diagrams. 


(a) Shaded region: EU F. 


(b) Shaded region: FF 


(b) Shaded region: E° 


Figure 2.2ECF. 
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The operations of forming unions, intersections, and complements of events obey 
certain rules similar to the rules of algebra. We list a few of these rules: 


Commutative laws EUF =FUE EF =FE 
Associative laws (EUF)UG =EU(FUG) (FF)G =E(FG) 
Distributive laws (EUF)G =EGUFG EFUG =(EUG)(FUG) 


These relations are verified by showing that any outcome that is contained in the 
event on the left side of the equality sign is also contained in the event on the right 
side, and vice versa. One way of showing this is by means of Venn diagrams. For 
instance, the distributive law may be verified by the sequence of diagrams in Figure 
2.3 


Figure 2.3 (E UF)G = EGU FG. 


E F 


G 


(a) Shaded region: EG. 
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G 
(b) Shaded region: FG. 


E F 


G 
(c) Shaded region: (E U F)G. 


The following useful relationships among the three basic operations of forming 
unions, intersections, and complements are known as DeMorgan’s laws: 


n n 
U £)= fn Ef 
i=1 i=1 
n : n 
M £,) = U £EF 
i=1 i=1 


For instance, for two events E and F, DeMorgan’s laws state that 


(EUF)° =E°F® and (EF)° = E°UF* 
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which can be easily proven by using Venn diagrams (see Theoretical Exercise 
7»). 


To prove DeMorgan’s laws for general n, suppose first that x is an outcome of 


n c n 
( U Ei) . Then xis not contained in U_ £;, which means that x is not 
i=1 i=1 


contained in any of the events £;, i = 1, 2,...,n, implying that x is contained in E/ for 


n 
alli=1,2,..., andthusiscontainedin 1 _ E;. To go the other way, suppose 
i=1 


n 
that x is an outcome of n Ef. Then x is contained in Ef for all i = 1,2, ... ,n, 
i=1 
which means that x is not contained in E; for any i = 1, 2, ...,, implying that x is not 
Cc 
contained in U E;, in turn implying that x is contained in (u FE) . This proves the first 
i 1 


of DeMorgan’s laws. 


To prove the second of DeMorgan’s laws, we use the first law to obtain 


which, since (E°)° = E, is equivalent to 
n c n 
( U Ef) = Ee 
1 1 


Taking complements of both sides of the preceding equation yields the result we 
seek, namely, 


2.3 Axioms of Probability 


One way of defining the probability of an event is in terms of its long run relative 
frequency. Such a definition usually goes as follows: We suppose that an 
experiment, whose sample space is S, is repeatedly performed under exactly the 
same conditions. For each event E of the sample space S, we define n(E£) to be the 
number of times in the first n repetitions of the experiment that the event FE occurs. 
Then P(E), the probability of the event E, is defined as 
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: n(E 
P(E) = lim me 
n 7 o n 
That is, P(E) is defined as the (limiting) proportion of time that E occurs. It is thus the 
limiting relative frequency of E. 


Although the preceding definition is certainly intuitively pleasing and should always 
be kept in mind by the reader, it possesses a serious drawback: How do we know 
that n(E)/n will converge to some constant limiting value that will be the same for 
each possible sequence of repetitions of the experiment? For example, suppose that 
the experiment to be repeatedly performed consists of flipping a coin. How do we 
know that the proportion of heads obtained in the first n flips will converge to some 
value as n gets large? Also, even if it does converge to some value, how do we know 
that, if the experiment is repeatedly performed a second time, we shall obtain the 
same limiting proportion of heads? 


Proponents of the relative frequency definition of probability usually answer this 
objection by stating that the convergence of n(E) /n to a constant limiting value is an 
assumption, or an axiom, of the system. However, to assume that n(E) /n will 
necessarily converge to some constant value seems to be an extraordinarily 
complicated assumption. For, although we might indeed hope that such a constant 
limiting frequency exists, it does not at all seem to be a priori evident that this need 
be the case. In fact, would it not be more reasonable to assume a set of simpler and 
more self-evident axioms about probability and then attempt to prove that such a 
constant limiting frequency does in some sense exist? The latter approach is the 
modern axiomatic approach to probability theory that we shall adopt in this text. In 
particular, we shall assume that, for each event E in the sample space S, there exists 
a value P(E), referred to as the probability of E. We shall then assume that all these 
probabilities satisfy a certain set of axioms, which, we hope the reader will agree, is 
in accordance with our intuitive notion of probability. 


Consider an experiment whose sample space is S. For each event E of the sample 
space S, we assume that a number P(E) is defined and satisfies the following three 
axioms: 


The three axioms of probability 


Axiom 1 


0< P(E) <1 


Axiom 2 


P(S) =1 


Axiom 3 
For any sequence of mutually exclusive events E,, E>, ... (that is, events for 
which E; E; = @ when i + j), 


C= 2 P= 4 


We refer to P(E) as the probability of the event E. 
Thus, Axiom 1 __ states that the probability that the outcome of the experiment is an 
outcome in E is some number between 0 and 1. Axiom 2 _ states that, with 
probability 1, the outcome will be a point in the sample space S. Axiom 3 _ states 
that, for any sequence of mutually exclusive events, the probability of at least one of 
these events occurring is just the sum of their respective probabilities. 


If we consider a sequence of events F,, Ez, ..., , where E; = Sand E, = @ for 


i> 1, then, because the events are mutually exclusive and becauseS= WU E;, 
4 


we have, from Axiom 3, 
P(S)= > P(E) =P(S)+ ) PO) 
i=1 i=2 


implying that 
P(®) =0 
That is, the null event has probability 0 of occurring. 


Note that it follows that, for any finite sequence of mutually exclusive events E,, 
ed 


(3.1) 
n n 
P( U E;) = x P(E;) 
1 i=1 
This equation follows from Axiom 3 __ by defining FE; as the null event for all values 
of i greater than n. Axiom 3 __ is equivalent to Equation (3.1) | when the sample 


space is finite. (Why?) However, the added generality of Axiom 3 _is necessary 
when the sample space consists of an infinite number of points. 


Example 3a 


If our experiment consists of tossing a coin and if we assume that a head is as 
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likely to appear as a tail, then we would have 


1 


P((H}) = PUT) = 5 


On the other hand, if the coin were biased and we believed that a head were 
twice as likely to appear as a tail, then we would have 


2 1 
P({H}) = 3 P({T}) = 3 


Example 3b 


If a die is rolled and we suppose that all six sides are equally likely to appear, 
1 
then we would have P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) = é 


From Axiom 3 ___, it would thus follow that the probability of rolling an even 
number would equal 


1 


P({2,4,6}) = P({2}) + PAD + PO) = 5 


The assumption of the existence of a set function P, defined on the events of a 
sample space S and satisfying Axioms 1. ,2 ,and3__ , constitutes the modern 
mathematical approach to probability theory. It is hoped that the reader will agree 
that the axioms are natural and in accordance with our intuitive concept of probability 
as related to chance and randomness. Furthermore, using these axioms, we shall be 
able to prove that if an experiment is repeated over and over again, then, with 
probability 1, the proportion of time during which any specific event E occurs will 
equal P(E). This result, known as the strong law of large numbers, is presented in 
Chapter 8. In addition, we present another possible interpretation of probability— 
as being a measure of belief—in Section 2.7 


Technical Remark. We have supposed that P(E) is defined for all the events £ of 
the sample space. Actually, when the sample space is an uncountably infinite set, 
P(E) is defined only for a class of events called measurable. However, this restriction 
need not concern us, as all events of any practical interest are measurable. 


2.4 Some Simple Propositions 


In this section, we prove some simple propositions regarding probabilities. We first 
note that since E and E* are always mutually exclusive and since E U E* = S, we 
have, by Axioms 2 and3_ , 
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1 = P(S) = P(EUE*) = P(E) + P(E) 


Or, equivalently, we have Proposition 4.1 


P(E°) =1-—P(E) 


Proposition 4.1 


In words, Proposition 4.1. states that the probability that an event does not 
occur is 1 minus the probability that it does occur. For instance, if the probability 


3 
of obtaining a head on the toss of a coin is B? then the probability of obtaining a 


5 
tail must be a 


Our second proposition states that if the event E is contained in the event F, then 
the probability of E is no greater than the probability of F. 


Proposition 4.2 


If E c F, then P(E) < P(F). 
Proof. Since E c F, it follows that we can express F as 
FeEWE EF 
Hence, because E and E‘F are mutually exclusive, we obtain, from Axiom 3 ___, 


P(F) = P(E) + P(E‘F) 


which proves the result, since P(E°F) = 0. 


Proposition 4.2 tells us, for instance, that the probability of rolling a 1 with a die is 
less than or equal to the probability of rolling an odd value with the die. 


The next proposition gives the relationship between the probability of the union of 
two events, expressed in terms of the individual probabilities, and the probability of 
the intersection of the events. 


P(E U F) = P(E) + P(F) — P(EF) 


Proposition 4.3 


Proof To derive a formula for P(E U F), we first note that E U F can be written as 
the union of the two disjoint events E and EF. Thus, from Axiom 3 __, we obtain 
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P(EUF) =P(EUE‘F) 
= P(E) + P(E‘F) 
Furthermore, since F = EF U E‘F, we again obtain from Axiom 3 
P(F) = P(EF) + P(E‘F) 
or, equivalently, 
P(E‘F) = P(F) — P(EF) 
thereby completing the proof. 


Proposition 4.3. could also have been proved by making use of the Venn diagram 
in Figure 2.4 


Figure 2.4 Venn diagram. 


E F 


Let us divide E U F into three mutually exclusive sections, as shown in Figure 2.5 
In words, section | represents all the points in E that are not in F (that is, EF‘), 
section II represents all points both in E and in F (that is, EF), and section III 
represents all points in F that are not in E (that is, E°F). 


Figure 2.5 Venn diagram in sections. 


i F 


From Figure 2.5, we see that 


EUF =JUTNvUII 
E =IvuII 
F =TuIII 


As I, Il, and Ill are mutually exclusive, it follows from Axiom 3 __ that 
P(EUF) = P(t) + P(t) + PID 
P(E) = P(t) + Pd) 
PCF) = PCI) + PCD 
which shows that 


P(E UF) = P(E) + P(F) — PCI) 


and Proposition 4.3 __ is proved, since Il = EF. 


Example 4a 


J is taking two books along on her holiday vacation. With probability .5, she will 
like the first book; with probability .4, she will like the second book; and with 
probability .3, she will like both books. What is the probability that she likes 
neither book? 


Solution 


Let B; denote the event that J likes book i, i = 1, 2. Then the probability that she 
likes at least one of the books is 


P(B, UB>) = P(B,) + P(Bz) — P(B, By) = .5+.4-.3=.6 


Because the event that J likes neither book is the complement of the event that 
she likes at least one of them, we obtain the result 


P(B{BS) = P((B, UB2)°) = 1— P(B, UB2) = 4 


We may also calculate the probability that any one of the three events E, F, and G 
occurs, namely, 


P(E UF UG) = P[(EUF) UG] 


which, by Proposition 4.3 __, equals 


P(E UF) + P(G) — P[(E U F)G] 


Now, it follows from the distributive law that the events (E U F)G and EG U FG are 
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equivalent; hence, from the preceding equations, we obtain 


P(EUFUG) 
= P(E) + P(F) — P(EF) + P(G) — P(EG U FG) 
= P(E) + P(F) — P(EF) + P(G) — P(EG) — P(FG) + P(EGFG) 
= P(E) + P(F) + P(G) + P(EF) — P(EG) — P(FG) + P(EFG) 


In fact, the following proposition, known as the inclusion—exclusion identity, can be 
proved by mathematical induction: 


Proposition 4.4 


n 
HT CoRON POR om > P(E,) — > P(E;,Ei,) + 
i=1 


ee by 
+(-D YP Big Ei) 
be we Be 


tet (= 1)" P(E Ey En) 


n 
The summation y; P(E;, E;,:-E;,.) is taken over all of the (") possible 


iy <ig << iy 


subsets of size r of the set {1, 2 ...,n}. 


In words, Proposition 4.4 — states that the probability of the union of n events 
equals the sum of the probabilities of these events taken one at a time, minus the 
sum of the probabilities of these events taken two at a time, plus the sum of the 
probabilities of these events taken three at a time, and so on. 


Remarks 1. For a noninductive argument for Proposition 4.4 __, note first that if 
an outcome of the sample space is not a member of any of the sets E;, then its 
probability does not contribute anything to either side of the equality. Now, 
suppose that an outcome is in exactly m of the events E;, where m > 0. Then, 


since it is in U E;, its probability is counted once in P( U FE) ; also, as this 
U U 


m 
outcome is contained in (7) subsets of the type E£;, F;,---E;,, its probability is 


(2) (2)+(G3)-~# Gn) 


times on the right of the equality signin Proposition 4.4 _. Thus, form > 0, we 


ix? 


counted 
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must show that 


m 
However, since 1 = a the preceding equation is equivalent to 


m 


S (rote 


i=0 


and the latter equation follows from the binomial theorem, since 


m 


O=(-14"= (")c- ae 


i=0 


2. The following is a succinct way of writing the inclusion—exclusion identity: 


PUL,E)= Y (-Y 5 PEE) 


r=1 bp Se Sy 

3. In the inclusion—exclusion identity, going out one term results in an upper 
bound on the probability of the union, going out two terms results in a 
lower bound on the probability, going out three terms results in an upper 
bound on the probability, going out four terms results in a lower bound, 
and so on. That is, for events F,....£,, we have 


(4.1) 
P(UR,E)< ¥ PE) 
i=1 
(4.2) 
P(UR,E)= ¥ PED- Y PEE) 
i=1 j<i 
(4.3) 


P(UR,E)< ¥ PE)- D PEE)+ DL PELE) 


if p< eae eae 
and so on. To prove the validity of these bounds, note the identity 


UP, Ej = EF, UE{E, U Ej ESE3 U- U Ej--En;_1En 


That is, at least one of the events EF; occurs if E, occurs, or if E, does not occur 
but E, does, or if E, and E, do not occur but EF; does, and so on. Because the 
right-hand side is the union of disjoint events, we obtain 


(4.4) 
P(U7_, Ej) = P(E1) + P(E{E2) + P(E{EZE3) +... + P(E{--En—1En) 
n 
=P(Ej)+ 2 PEt -Ef-1E,) 

i=2 
Now, let B; = E{---Ef-1 = (U; <;jE;)° be the event that none of the first i — 1 
events occurs. Applying the identity 

P(E;) = P(B,E;) + P(BiEV) 
shows that 
P(E,) = P(Ej«Ej-4 Ei) = PE’) + PCUjei BE) 
or, equivalently, 
P(E{+-Ej_1E;) = P(E,) — PC Uj <i E,E;) 

Substituting this equation into (4.4) yields 


(4.5) 


PUL (Ei) = > PED = >. PC Uj<i EjE;) 


Because probabilities are always nonnegative, Inequality (4.1) follows directly 
from Equation (4.5) —_. Now, fixing i and applying Inequality (1) to P( U; <; E,E;) 
yields 


PC Uj; <i EE) S ». P(E;E;) 


j<i 


which, by Equation (4.5) __, gives Inequality (4.2). Similarly, fixing / and applying 
Inequality (4.2) to P( U; <; E;E;) yields 


P(Uj<i EiE;) > >, P(E,E;) 2 P(E,E EE) 
j<i k<j<i 
j<i k<j<i 
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which, by Equation (4.5) __, gives Inequality (4.3). The next inclusion—exclusion 
inequality is now obtained by fixing i and applying Inequality (4.3) to 
P( Ujci E;E;), and so on. 


The first inclusion-exclusion inequality, namely that 
n 
P(ULyE)< > PED 
i=1 


is known as Boole’s inequality. 


2.5 Sample Spaces Having Equally Likely 
Outcomes 


In many experiments, it is natural to assume that all outcomes in the sample space 
are equally likely to occur. That is, consider an experiment whose sample space S is 
a finite set, say, S = {1, 2,.., N }. Then, it is often natural to assume that 


P({1}) = P2}) = + = P({N}) 


which implies, from Axioms 2. and3_—(why?), that 
PC{it) = a =1,2 N 
() = 5 i=1,2,.., 
From this equation, it follows from Axiom 3 __ that, for any event E, 


P(E) number of outcomes in FE 
~ number of outcomes in S 


In words, if we assume that all outcomes of an experiment are equally likely to occur, 
then the probability of any event E equals the proportion of outcomes in the sample 
space that are contained in E. 


Example 5a 


If two dice are rolled, what is the probability that the sum of the upturned faces 
will equal 7? 


Solution 


We shall solve this problem under the assumption that all of the 36 possible 


outcomes are equally likely. Since there are 6 possible outcomes—namely, (1, 6), 
(2, 5), (3, 4), (4, 3), (5, 2), and (6, 1)-that result in the sum of the dice being 


6 1 
equal to 7, the desired probability is 36° & 


Example 5b 


If 3 balls are “randomly drawn” from a bowl containing 6 white and 5 black balls, 
what is the probability that one of the balls is white and the other two black? 


Solution 


If we regard the balls as being distinguishable and the order in which they are 
selected as being relevant, then the sample space consists of 11- 10-9 = 990 
outcomes. Furthermore, there are 6 - 5 - 4 = 120 outcomes in which the first ball 
selected is white and the other two are black; 5 - 6 - 4 = 120 outcomes in which 
the first is black, the second is white, and the third is black; and 5-4-6 = 120in 
which the first two are black and the third is white. Hence, assuming that 
“randomly drawn” means that each outcome in the sample space is equally likely 
to occur, we see that the desired probability is 


120+120+120 4 


990 11 


This problem could also have been solved by regarding the outcome of the 
experiment as the unordered set of drawn balls. From this point of view, there are 


1 
( 3 = 165 outcomes in the sample space. Now, each set of 3 balls corresponds 


to 3! outcomes when the order of selection is noted. As a result, if all outcomes 
are assumed equally likely when the order of selection is noted, then it follows 
that they remain equally likely when the outcome is taken to be the unordered set 
of selected balls. Hence, using the latter representation of the experiment, we 
see that the desired probability is 


which, of course, agrees with the answer obtained previously. 


When the experiment consists of a random selection of k items from a set of n items, 
we have the flexibility of either letting the outcome of the experiment be the ordered 
selection of the k items or letting it be the unordered set of items selected. In the 
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former case, we would assume that each new selection is equally likely to be any of 
the so far unselected items of the set, and in the latter case, we would assume that 


n 
all (") possible subsets of k items are equally likely to be the set selected. For 


instance, suppose 5 people are to be randomly selected from a group of 20 
individuals consisting of 10 married couples, and we want to determine P(N), the 
probability that the 5 chosen are all unrelated. (That is, no two are married to each 
other.) If we regard the sample space as the set of 5 people chosen, then there are 


20 
( P equally likely outcomes. An outcome that does not contain a married couple 


can be thought of as being the result of a six-stage experiment: In the first stage, 5 of 
the 10 couples to have a member in the group are chosen; in the next 5 stages, 1 of 


10 
the 2 members of each of these couples is selected. Thus, there are ( P jes possible 


outcomes in which the 5 members selected are unrelated, yielding the desired 
probability of 


In contrast, we could let the outcome of the experiment be the ordered selection of 
the 5 individuals. In this setting, there are 20- 19-18-17 - 16 equally likely 
outcomes, of which 20 -18- 16-14-12 outcomes result in a group of 5 unrelated 
individuals, yielding the result 


20:-18-16-14-12 


P(N) = ——_—_______ 
WY) = 39-19- 18-17-16 


We leave it for the reader to verify that the two answers are identical. 


Example 5c 


A committee of 5 is to be selected from a group of 6 men and 9 women. If the 
selection is made randomly, what is the probability that the committee consists of 
3 men and 2 women? 


Solution 


15 
Because each of the ( P possible committees is equally likely to be selected, 


the desired probability is 
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Example 5d 


An urn contains n balls, one of which is special. If k of these balls are withdrawn 
one at a time, with each selection being equally likely to be any of the balls that 
remain at the time, what is the probability that the special ball is chosen? 


Solution 
Since all of the balls are treated in an identical manner, it follows that the set of k 


balls selected is equally likely to be any of the (7) sets of k balls. Therefore, 


(le) 
1/\k-1 k 
P{special ball is selected} = ———~——_—— = — 
n n 
Ct) 
We could also have obtained this result by letting A; denote the event that the 
special ball is the ith ball to be chosen, i = 1,...,k. Then, since each one of the n 


balls is equally likely to be the ith ball chosen, it follows that P(A;) = 1/n. Hence, 
because these events are clearly mutually exclusive, we have 


k 


k 
k 
P{special ball is selected} = ( U Ai) = > P(A;) = 25 
P=4 
f=4 


We could also have argued that P(A;) = 1/n, by noting that there are 

n(n —1):-(n—k+1) =n!/(n—k)! equally likely outcomes of the experiment, of 
which (n — 1)(n— 2):-\(n-i+1)(1)(n-i)--\(n-—k+1) =(n-1)!/(n-k)! 
result in the special ball being the ith one chosen. From this reasoning, it follows 
that 


(n—-1)!_ 1 


P(A;) = i 


Example 5e 


Suppose that n + m balls, of which n are red and m are blue, are arranged ina 
linear order in such a way that all (n + m)! possible orderings are equally likely. If 
we record the result of this experiment by listing only the colors of the successive 
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balls, show that all the possible results remain equally likely. 


Solution 


Consider any one of the (n + m)! possible orderings, and note that any 
permutation of the red balls among themselves and of the blue balls among 
themselves does not change the sequence of colors. As a result, every ordering 
of colorings corresponds to n! m! different orderings of the n + m balls, so every 


nim! 
ordering of the colors has probability Gem)! of occurring. 
For example, suppose that there are 2 red balls, numbered r,,7,, and 2 blue 
balls, numbered b,,b,. Then, of the 4! possible orderings, there will be 2! 2! 
orderings that result in any specified color combination. For instance, the 
following orderings result in the successive balls alternating in color, with a red 
ball first: 


71, by, T2, bz 14, bz, 12, by 12, b1, 71, bz 12, bz, 14, by 


4 1 
Therefore, each of the possible orderings of the colors has probability 5A G of 


occurring. 


Example 5f 


A poker hand consists of 5 cards. If the cards have distinct consecutive values 
and are not all of the same suit, we say that the hand is a straight. For instance, 
a hand consisting of the five of spades, six of spades, seven of spades, eight of 
spades, and nine of hearts is a straight. What is the probability that one is dealt a 
straight? 


Solution 


52 
We start by assuming that all ( P possible poker hands are equally likely. To 


determine the number of outcomes that are straights, let us first determine the 
number of possible outcomes for which the poker hand consists of an ace, two, 
three, four, and five (the suits being irrelevant). Since the ace can be any 1 of the 
4 possible aces, and similarly for the two, three, four, and five, it follows that there 
are 4° outcomes leading to exactly one ace, two, three, four, and five. Hence, 
since in 4 of these outcomes all the cards will be of the same suit (such a hand is 
called a straight flush), it follows that there are 4° — 4 hands that make up a 
straight of the form ace, two, three, four, and five. Similarly, there are 14 
hands that make up a straight of the form ten, jack, queen, king, and ace. Thus, 
there are 10(4° — 4) hands that are straights, and it follows that the desired 
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probability is 
10(4° — 4) er 
52\~- == 
5 
Example 5g 


A 5-card poker hand is said to be a full house if it consists of 3 cards of the same 
denomination and 2 other cards of the same denomination (of course, different 
from the first denomination). Thus, a full house is three of a kind plus a pair. What 
is the probability that one is dealt a full house? 


Solution 
52 
Again, we assume that all ( P possible hands are equally likely. To determine 


4\ (4 
the number of possible full houses, we first note that there are (3) different 


combinations of, say, 2 tens and 3 jacks. Because there are 13 different choices 
for the kind of pair and, after a pair has been chosen, there are 12 other choices 
for the denomination of the remaining 3 cards, it follows that the probability of a 
full house is 


Example 5h 


In the game of bridge, the entire deck of 52 cards is dealt out to 4 players. What 
is the probability that 


a. one of the players receives all 13 spades; 
b. each player receives 1 ace? 


Solution 


a. Letting FE; be the event that hand i has all 13 spades, then 
1 
P(E) =s—— i=1,2,3,4 


(s) 


Because the events E;, i = 1, 2,3, 4, are mutually exclusive, the probability 


that one of the hands is dealt all 13 spades is 
4 52 a 
P( Uf, FE) = pen = 4/( )+6ax10-™ 
c=" 13 

b. Let the outcome of the experiment be the sets of 13 cards of each of the 

players 1, 2, 3, 4. To determine the number of outcomes in which each of 

the distinct players receives exactly 1 ace, put aside the aces and note 

48 

12,12,12,12 
when each player is to receive 12. Because there are 4! ways of dividing 


that there are ( possible divisions of the other 48 cards 


the 4 aces so that each player receives 1, we see that the number of 
possible outcomes in which each player receives exactly 1 ace is 


48 
4! : 
12,12,12,12 


52 


As there are 
13, 13,13,13 


possible hands, the desired probability is thus 


48 
4! 
12,12,12,12 


Be 
13,13,13,13 


Some results in probability are quite surprising when initially encountered. Our next 


= 1055 


two examples illustrate this phenomenon. 


Example 5i 


If n people are present in a room, what is the probability that no two of them 
celebrate their birthday on the same day of the year? How large need n be so 


1 
that this probability is less than 3 ? 


Solution 


As each person can celebrate his or her birthday on any one of 365 days, there 
are a total of (365)” possible outcomes. (We are ignoring the possibility of 
someone having been born on February 29.) Assuming that each outcome is 
equally likely, we see that the desired probability is 

(365) (364) (363) ... (365 — n+ 1)/(365)”. It is a rather surprising fact that when 


1 
n => 23, this probability is less than 3 That is, if there are 23 or more people in a 


room, then the probability that at least two of them have the same birthday 
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1 
exceeds 3 Many people are initially surprised by this result, since 23 seems so 
small in relation to 365, the number of days of the year. However, every pair of 


365 1 
individuals has probability ——— = —— of having the same birthday, and in a 


23 
group of 23 people, there are ( 5 = 253 different pairs of individuals. Looked at 


this way, the result no longer seems so surprising. 


When there are 50 people in the room, the probability that at least two share the 

same birthday is approximately .970, and with 100 persons in the room, the odds 
3x 10° 

are better than 3,000,000:1. (That is, the probability is greater than eal 


that at least two people have the same birthday.) 


Example 5j 


A deck of 52 playing cards is shuffled, and the cards are turned up one at a time 
until the first ace appears. Is the next card—that is, the card following the first 
ace—more likely to be the ace of spades or the two of clubs? 


Solution 


To determine the probability that the card following the first ace is the ace of 
spades, we need to calculate how many of the (52)! possible orderings of the 
cards have the ace of spades immediately following the first ace. To begin, note 
that each ordering of the 52 cards can be obtained by first ordering the 51 cards 
different from the ace of spades and then inserting the ace of spades into that 
ordering. Furthermore, for each of the (51)! orderings of the other cards, there is 
only one place where the ace of spades can be placed so that it follows the first 
ace. For instance, if the ordering of the other 51 cards is 


Ac, 6h, Jd, 5s, Ac, 7d,..., Kh 
then the only insertion of the ace of spades into this ordering that results in its 
following the first ace is 
Ac, 6h, Jd, 5s, Ac, As, 7d,.., Kh 


Therefore, there are (51)! orderings that result in the ace of spades following the 
first ace, so 
(alt 4 


P{the ace of spades follows the first ace} = (52)! = 
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In fact, by exactly the same argument, it follows that the probability that the two 


1 
of clubs (or any other specified card) follows the first ace is also 52° In other 


words, each of the 52 cards of the deck is equally likely to be the one that follows 
the first ace! 


Many people find this result rather surprising. Indeed, a common reaction is to 
suppose initially that it is more likely that the two of clubs (rather than the ace of 
spades) follows the first ace, since that first ace might itself be the ace of spades. 
This reaction is often followed by the realization that the two of clubs might itself 
appear before the first ace, thus negating its chance of immediately following the 
first ace. However, as there is one chance in four that the ace of spades will be 
the first ace (because all 4 aces are equally likely to be first) and only one chance 
in five that the two of clubs will appear before the first ace (because each of the 
set of 5 cards consisting of the two of clubs and the 4 aces is equally likely to be 
the first of this set to appear), it again appears that the two of clubs is more likely. 
However, this is not the case, and our more complete analysis shows that they 
are equally likely. 


Example 5k 


A football team consists of 20 offensive and 20 defensive players. The players 
are to be paired in groups of 2 for the purpose of determining roommates. If the 
pairing is done at random, what is the probability that there are no offensive— 
defensive roommate pairs? What is the probability that there are 2i offensive— 
defensive roommate pairs, i = 1, 2,..., 10? 


Solution 


There are 
40 _ (40)! 
ey eee 4 el A 


ways of dividing the 40 players into 20 ordered pairs of two each. (That is, there 
are (40)!/27° ways of dividing the players into a first pair, a second pair, and so 
on.) Hence, there are (40)!/27°(20)! ways of dividing the players into 
(unordered) pairs of 2 each. Furthermore, since a division will result in no 
offensive—defensive pairs if the offensive (and defensive) players are paired 
among themselves, it follows that there are [(20) 1/2*°(10)!] such divisions. 
Hence, the probability of no offensive—defensive roommate pairs, call it Po, is 
given by 
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(20)! \7 
Garou) _— [(20)1]° 


40)! Fa0)!]°(40)! 
27°20)! 


Py = 


To determine P,,, the probability that there are 2i offensive—defensive pairs, we 
2. 
first note that there are (7) ways of selecting the 2i offensive players and the 
i 


2i defensive players who are to be in the offensive—defensive pairs. These 4i 
players can then be paired up into (2i)! possible offensive—defensive pairs. (This 
is so because the first offensive player can be paired with any of the 2i defensive 
players, the second offensive player with any of the remaining 2i — 1 defensive 
players, and so on.) As the remaining 20 — 2i offensive (and defensive) players 
must be paired among themselves, it follows that there are 


(72) eo (20 — 2i)! | 
ai) (2)! 2*°-'(40 —7)! 


divisions that lead to 2i offensive—defensive pairs. Hence, 


20\" (20-21)! ]° 


(40)! 
27°(20)! 


Pat = 0,1,...,10 


The P,;,i = 0, 1,..., 10, can now be computed, or they can be approximated by 
making use of a result of Stirling, which shows that n! can be approximated by 
n™*+1/2e-"/27, For instance, we obtain 


Py ¥ 1.3403 x 10 ° 
Pio © 345861 


Px) © 7.6068 x 10 © 


Our next three examples illustrate the usefulness of the inclusion—exclusion identity 
(Proposition 4.4) .In Example 51 ___, the introduction of probability enables us to 
obtain a quick solution to a counting problem. 


Example 5l 


A total of 36 members of a club play tennis, 28 play squash, and 18 play 
badminton. Furthermore, 22 of the members play both tennis and squash, 12 
play both tennis and badminton, 9 play both squash and badminton, and 4 play 
all three sports. How many members of this club play at least one of three 
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sports? 


Solution 


Let N denote the number of members of the club, and introduce probability by 
assuming that a member of the club is randomly selected. If, for any subset C of 
members of the club, we let P(C) denote the probability that the selected member 
is contained in C, then 


number of members in C 


P(C) = 7 


Now, with T being the set of members that plays tennis, S being the set that plays 
squash, and B being the set that plays badminton, we have, from Proposition 
4.4 


P(T USUB) 
= P(T) + P(S) + P(B) — P(TS) — P(TB) — P(SB) + P(TSB) 
_ 364284 18-22-12-9+4 
= 


Hence, we can conclude that 43 members play at least one of the sports. 


The next example in this section not only possesses the virtue of giving rise to a 
somewhat surprising answer, but is also of theoretical interest. 


Example 5m The Matching Problem 


Suppose that each of N men at a party throws his hat into the center of the room. 
The hats are first mixed up, and then each man randomly selects a hat. What is 
the probability that none of the men selects his own hat? 


Solution 


We first calculate the complementary probability of at least one man selecting his 
own hat. Let us denote by E£;, i = 1, 2,..., N the event that the ith man selects his 


—_ 


N 
own hat. Now, by the inclusion-exclusion identity ( U E,), the probability that 
i 1 


at least one of the men selects his own hat, is given by 
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r( i U r) = P(E;) — ». P(E;,Ei,) + 


: eS tg 


+=" SPE Eig Et) 


iy < igs < iy 


tet (— 1)" "7 P(E Ep--Ey) 


If we regard the outcome of this experiment as a vector of N numbers, where the 
ith element is the number of the hat drawn by the ith man, then there are N! 
possible outcomes. [The outcome (1, 2, 3, ..., N) means, for example, that each 
man selects his own hat.] Furthermore, £;, £;,...E;,,, the event that each of the n 
men i,, iz, ..., i, selects his own hat, can occur in any of 

(N —n)(N—n-1)--:3-2-1=(N—n)! possible ways; for, of the remaining N — n 
men, the first can select any of N — n hats, the second can then select any of 

N —n-— 1 hats, and so on. Hence, assuming that all V! possible outcomes are 
equally likely, we see that 


(N —n)! 
P(E; Bin* Fig) = 
N 
Also, as there are ( terms in os P(E;, £;,°-E;,,), it follows that 
n 


iy < igs < iy 


N! (N—n)! 1 
> PE re Eh) = ean Ne at 


iy < igs < iy 


Thus, 


lc= 
ty 
I 
ray 
| 
a 
| 
+ 
hae 
ray 
VY 
=] 
+ 
My 


Upon letting x = — 1 in the identity e* = ». x'/i!, the preceding probability 
i=0 

when N is large is seen to be approximately equal to e+ = .3679. In other words, 

for N large, the probability that none of the men selects his own hat is 
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approximately .37. (How many readers would have incorrectly thought that this 
probability would go to 1 as N > «?) 


For another illustration of the usefulness of the inclusion-exclusion identity, consider 
the following example. 


Example 5n 


Compute the probability that if 10 married couples are seated at random at a 
round table, then no wife sits next to her husband. 


Solution 


If we let E;,i = 1, 2, ..., 10 denote the event that the ith couple sit next to each 


t= 1 


10 
other, it follows that the desired probability is 1 — ( J Ei) Now, from the 


inclusion-exclusion identity, 


10 10 
— a — eco — n+1 . a eee a 
r( U a _ D PED aC 1) ». P(E;, Fi, Ein) 


iy Sag te Sy 


++ — P(E, E9--E 49) 


To compute P(E;, £;,:--E;,,), we first note that there are 19! ways of arranging 20 
people around a round table. (Why?) The number of arrangements that result in 
a specified set of n men sitting next to their wives can most easily be obtained by 
first thinking of each of the n married couples as being single entities. If this were 
the case, then we would need to arrange 20 — n entities around a round table, 
and there are clearly (20 — n — 1)! such arrangements. Finally, since each of the 
n married couples can be arranged next to each other in one of two possible 
ways, it follows that there are 2"(20 — n — 1)! arrangements that result in a 
specified set of n men each sitting next to their wives. Therefore, 


2"(19 —n)! 
PEL Eye hy) = —a9yF 


Thus, from Proposition 4.4 __, we obtain that the probability that at least one 
married couple sits together is 


10\,, (18)! /10\,, (17)! (10), (16)! 10), 9! 
(7) am (9? at (5 (19)! (9)? ion 


and the desired probability is approximately .3395. 
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*Example 50 Runs 


Consider an athletic team that had just finished its season with a final record of n 
wins and m losses. By examining the sequence of wins and losses, we are 
hoping to determine whether the team had stretches of games in which it was 
more likely to win than at other times. One way to gain some insight into this 
question is to count the number of runs of wins and then see how likely that 
result would be when all (n + m)!/(n! m!) orderings of the n wins and m losses 
are assumed equally likely. By a run of wins, we mean a consecutive sequence 
of wins. For instance, if n = 10, m = 6, and the sequence of outcomes was 
WWLLWWWLWLLLWWWW, then there would be 4 runs of wins—the first run 
being of size 2, the second of size 3, the third of size 1, and the fourth of size 4. 


Suppose now that a team has n wins and m losses. Assuming that all (n + m)!/ 
n+m 
(n!m!) = ( orderings are equally likely, let us determine the probability 
n 


that there will be exactly r runs of wins. To do so, consider first any vector of 
positive integers x1, X2,...,x, with x; ++: +x, =n, and let us see how many 
outcomes result in rruns of wins in which the ith run is of size x;,i = 1,...,r. For 
any such outcome, if we let y, denote the number of losses before the first run of 


wins, y, the number of losses between the first 2 runs of wins, ... the 


Vea 
number of losses after the last run of wins, then the y, satisfy 


Vity, te +4, =m y¥,260Y,,, 2 Oy, > 08 = 2,..,7 


and the outcome can be represented schematically as 


LL...LWW..WL...LWW..W+WW LL 


V4 #4 V2 #2 Xr Veta 


Hence, the number of outcomes that result in r runs of wins the ith of size 


x;,t = 1, ...r — is equal to the number of integers y,,...,y,, , that satisfy the 


r+i1 
foregoing, or, equivalently, to the number of positive integers 


VY, = 9, F1 FV, KH Vp tH 2.007, Vay HV pe, F1 
that satisfy 
V,+3,+-+3,,,=m+2 
By Proposition 6.1 inChapter1 _, there are ("* ‘) such outcomes. 


m+1 
Hence, the total number of outcomes that result in r runs of wins is ( 
r 
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multiplied by the number of positive integral solutions of x, + --- + x, =n. Thus, 


n-1 


mt+1 
again from Proposition 6.1 _, there are ( )( 7 outcomes resulting in 
T 


T— 


n+m 
r runs of wins. As there are ( equally likely outcomes, it follows that 
n 


(" + ‘\(? _ ') 
r r—1 
——_———— rel 
m+n 
For example, ifn = 8 and m = 6, then the probability of 7 runs is 


(seme 
: 


outcomes are equally likely. Hence, if the outcome was WLWLWLWLWWLWLW, 
then we might suspect that the team’s probability of winning was changing over 
time. (In particular, the probability that the team wins seems to be quite high 
when it lost its last game and quite low when it won its last game.) On the other 
extreme, if the outcome were WWWWWWWWLLLLLL, then there would have 


P({rruns of wins}) = 


1/\0 8 
again seem unlikely that the team’s probability of winning remained unchanged 
over its 14 games. 


7\(7\ | (14 
been only 1 run, and as P({1run}) = ( )( nit = 1/429, it would thus 


‘2.6 Probability as a Continuous Set 
Function 


A sequence of events {E,,,n = 1} is said to be an increasing sequence if 


E,-Ci, Cw CE, Cli4, CoC 


whereas it is said to be a decreasing sequence if 


Ey, Dy DD Ey SD Ey oO 


If {E,,,n = 1} is an increasing sequence of events, then we define a new event, 
denoted by lim Ey, by 
no 
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lim E,= ; E; 


n 


Il <8 


i 


Similarly, if {E,,,n = 1} is a decreasing sequence of events, we define lim E,, by 
n [ee) 


E; 
1 


5 
by 
3 
| 

Il D8 


We now prove the following Proposition 6.1 


Proposition 6.1 
If {E,,,.n = 1} is either an increasing or a decreasing sequence of events, then 
: lim | P(E.) = PC lim BE.) 


Proof Suppose, first, that {E,,,n => 1} is an increasing sequence, and define the 
events F,,,n = 1, by 


Cc 
n—ti1 
Fa = Bal U a =E,ES_, n>1 
1 


—1 
where we have used the fact that 7 UE; = Ey_-1, since the events are 
1 


increasing. In words, F,, consists of those outcomes in E,, that are not in any of 
the earlier E;,i < n. It is easy to verify that the F,, are mutually exclusive events 
such that 


E, foralln=>1 


Il c8 
= 
I 
Il c8 
ty 
feb) 
=} 
[or 
loca 
= 
I 
lca 


Thus, 


ts 
fa NS 
C8 
m 
ee” 
Il 


[e,e) 
o( Br) 
1 


By P(F;) (by Axiom 3) 
1 


n 
_ lim 2 P(F;) 


n 
n — © 1 


n 
n — © 1 


_ lim, P(En) 


which proves the result when {E,,, n= 1} is increasing. 


If {E,,n = 1} is a decreasing sequence, then {E;,n => 1} is an increasing 
sequence; hence, from the preceding equations, 


P( U Ff) = lim P(ES) 
1 n—-o 
[o-e) [o-e) ie 
However, because U Ef =| 1) £; | , it follows that 
1 


1 
o(( Nn Ei) = lim p(Ex) 
1 no 


or, equivalently, 


no 


1- Pl Nn Es) = lim [1-Pé,)]}=1- lim P¢é£,) 
1 no 


or 


which proves the result. 


Example 6a Probability and a “Paradox” 
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Suppose that we possess an infinitely large urn and an infinite collection of balls 
labeled ball number 1, number 2, number 3, and so on. Consider an experiment 
performed as follows: At 1 minute to 12 P.M., balls numbered 1 through 10 are 

placed in the urn and ball number 10 is withdrawn. (Assume that the withdrawal 


1 
takes no time.) At 5 minute to 12 P.M., balls numbered 11 through 20 are placed 


1 
in the urn and ball number 20 is withdrawn. At ri minute to 12 P.M., balls 
numbered 21 through 30 are placed in the urn and ball number 30 is withdrawn. 
1 
At F minute to 12 P.M., and so on. The question of interest is, How many balls are 


in the urn at 12 PM.? 


The answer to this question is clearly that there is an infinite number of balls in 
the urn at 12 P.M., since any ball whose number is not of the form 10n, n = 1, will 
have been placed in the urn and will not have been withdrawn before 12 P.M. 
Hence, the problem is solved when the experiment is performed as described. 


However, let us now change the experiment and suppose that at 1 minute to 12 
P.M., balls numbered 1 through 10 are placed in the urn and ball number 1 is 
withdrawn; at . minute to 12 P.M., balls numbered 11 through 20 are placed in the 


1 
urn and ball number 2 is withdrawn; at A minute to 12 P.M, balls numbered 21 


1 
through 30 are placed in the urn and ball number 3 is withdrawn; at 8 minute to 


12 P.M., balls numbered 31 through 40 are placed in the urn and ball number 4 is 
withdrawn, and so on. For this new experiment, how many balls are in the urn at 
12 P.M.? 


Surprisingly enough, the answer now is that the urn is empty at 12 P.m. For, 
consider any ball—say, ball number n. At some time prior to 12 P.M. [in particular, 


n-1 
at G) minutes to 12 P.M.], this ball would have been withdrawn from the urn. 


Hence, for each n, ball number 7 is not in the urn at 12 P.M.; therefore, the urn 
must be empty at that time. 


Because for all n, the number of balls in the urn after the nth interchange is the 
same in both variations of the experiment, most people are surprised that the two 
scenarios produce such different results in the limit. It is important to recognize 
that the reason the results are different is not because there is an actual paradox, 
or mathematical contradiction, but rather because of the logic of the situation, 
and also that the surprise results because one’s initial intuition when dealing with 
infinity is not always correct. (This latter statement is not surprising, for when the 
theory of the infinite was first developed by the mathematician Georg Cantor in 
the second half of the nineteenth century, many of the other leading 
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mathematicians of the day called it nonsensical and ridiculed Cantor for making 
such claims as that the set of all integers and the set of all even integers have 
the same number of elements.) 


We see from the preceding discussion that the manner in which the balls are 
withdrawn makes a difference. For, in the first case, only balls numbered 
10n,n = 1, are ever withdrawn, whereas in the second case all of the balls are 
eventually withdrawn. Let us now suppose that whenever a ball is to be 
withdrawn, that ball is randomly selected from among those present. That is, 
suppose that at 1 minute to 12 P.M. balls numbered 1 through 10 are placed in 
the urn and a ball is randomly selected and withdrawn, and so on. In this case, 
how many balls are in the urn at 12 P.M.? 


Solution 


We shall show that, with probability 1, the urn is empty at 12 p.m. Let us first 
consider ball number 1. Define E,, to be the event that ball number 1 is still in the 
urn after the first n withdrawals have been made. Clearly, 


oe 9-18-27 ++ (9n) 
10-19-28 -- (9n+1) 

[To understand this equation, just note that if ball number 1 is still to be in the urn 

after the first n withdrawals, the first ball withdrawn can be any one of 9, the 

second any one of 18 (there are 19 balls in the urn at the time of the second 

withdrawal, one of which must be ball number 1), and so on. The denominator is 

similarly obtained.] 


Now, the event that ball number 1 is in the urn at 12 P.M. is just the event 


f E,,. Because the events E,,,n > 1, are decreasing events, it follows from 
1 


n = 


Proposition 6.1 that 
P{ball number 1 is in the urn at 12P.M.} 


P( Nn En] 
n=1 


__ lim, P(En) 


We now show that 


Since 


Z on \_[ @ (9n+1 re 
n=1\9n+1) |n=1 


this is equivalent to showing that 


Now, for all m > 1, 


CO m 
IT (145) = Il (+5) 
n—=1 on n= 


m 
>=—+ Pear teat 
18 27 om 
m 
=5 >; 
9 i 
i=1 


Hence, letting m — © and using the fact that » 1/i = ~ yields 
i=1 


PT (1+2 
as = Cc 
on 
n=1 


Thus, letting F; denote the event that ball number i is in the urn at 12 P.M., we 


have shown that P(F,) = 0. Similarly, we can show that P(F;) = 0 for all i. 


(ee) 


(For instance, the same reasoning shows that P(F;) = Hl [9n/(9n + 1)] for 


n=2 
i= 11,12,...,20.) Therefore, the probability that the urn is not empty at 12 P.m., 
P( 10 Fi), satisfies 
1 


CO (ee) 
( U F,) < > PED = 
1 1 
by Boole’s inequality. 
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Thus, with probability 1, the urn will be empty at 12 P.m. 


2.f Probability as a Measure of Belief 


Thus far we have interpreted the probability of an event of a given experiment as 
being a measure of how frequently the event will occur when the experiment is 
continually repeated. However, there are also other uses of the term probability. For 
instance, we have all heard such statements as “It is 90 percent probable that 
Shakespeare actually wrote Hamlet?” or “The probability that Oswald acted alone in 
assassinating Kennedy is .8.” How are we to interpret these statements? 


The most simple and natural interpretation is that the probabilities referred to are 
measures of the individual’s degree of belief in the statements that he or she is 
making. In other words, the individual making the foregoing statements is quite 
certain that Oswald acted alone and is even more certain that Shakespeare wrote 
Hamlet. This interpretation of probability as being a measure of the degree of one’s 
belief is often referred to as the personal or subjective view of probability. 


It seems logical to suppose that a “measure of the degree of one’s belief’ should 
satisfy all of the axioms of probability. For example, if we are 70 percent certain that 
Shakespeare wrote Julius Caesar and 10 percent certain that it was actually 
Marlowe, then it is logical to suppose that we are 80 percent certain that it was either 
Shakespeare or Marlowe. Hence, whether we interpret probability as a measure of 
belief or as a long-run frequency of occurrence, its mathematical properties remain 
unchanged. 


Example 7a 


Suppose that in a 7-horse race, you believe that each of the first 2 horses has a 
20 percent chance of winning, horses 3 and 4 each have a 15 percent chance, 
and the remaining 3 horses have a 10 percent chance each. Would it be better 
for you to wager at even money that the winner will be one of the first three 
horses or to wager, again at even money, that the winner will be one of the 
horses 1, 5, 6, and 7? 


Solution 


On the basis of your personal probabilities concerning the outcome of the race, 
your probability of winning the first bet is .2 + .2 + .15 = .55, whereas it is 
.2+.1+.1+.1=.5 for the second bet. Hence, the first wager is more attractive. 


Note that in supposing that a person’s subjective probabilities are always consistent 
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with the axioms of probability, we are dealing with an idealized rather than an actual 
person. For instance, if we were to ask someone what he thought the chances were 
of 


a. rain today, 

b. rain tomorrow, 

c. rain both today and tomorrow, 
d. rain either today or tomorrow, 


it is quite possible that, after some deliberation, he might give 30 percent, 40 percent, 
20 percent, and 60 percent as answers. Unfortunately, such answers (or such 
subjective probabilities) are not consistent with the axioms of probability. (Why not?) 
We would of course hope that after this was pointed out to the respondent, he would 
change his answers. (One possibility we could accept is 30 percent, 40 percent, 10 
percent, and 60 percent.) 


Summary 


Let S denote the set of all possible outcomes of an experiment. S is called the sample 
space of the experiment. An event is a subset of S. If A;,i = 1,....n, are events, then 


n 
U_ Aj, called the union of these events, consists of all outcomes that are in at 
4 


n 
least one of the events A;,i = 1,...n. Similarly, M A;, sometimes written as A,:--A,, 
i=1 


is called the intersection of the events A; and consists of all outcomes that are in all 
of the events A;,i = 1,...,n. 


For any event A, we define A° to consist of all outcomes in the sample space that are 
not in A. We call A* the complement of the event A. The event S°, which is empty of 
outcomes, is designated by © and is called the null set. If AB = @, then we say that A 
and B are mutually exclusive. 


For each event A of the sample space S, we suppose that a number P(A), called the 
probability of A, is defined and is such that 


i.0< P(A) <1 
ii. P(S) =1 
iii. For mutually exclusive events A;,i = 1, 
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P(A) represents the probability that the outcome of the experiment is in A. 
It can be shown that 

P(A‘) = 1— P(A) 
A useful result is that 


P(AUB) = P(A) + P(B) — P(AB) 


which can be generalized to give 


4 n 
o( U Ai) = Z Pa)y- EE Peay) 
i=1 i=1 i<j 
+ YYY PAiAjAx) 
i<j<k 


tet (= 1)" "P(A, An) 
This result is known as the inclusion—exclusion identity. 


If S is finite and each one point set is assumed to have equal probability, then 


_|4l 
P(A) = 75 


where |E| denotes the number of outcomes in the event E. 


P(A) can be interpreted either as a long-run relative frequency or as a measure of 


one’s degree of belief. 


Problems 


1. A box contains 3 marbles: 1 red, 1 green, and 1 blue. Consider an 


experiment that consists of taking 1 marble from the box and then replacing it 
in the box and drawing a second marble from the box. Describe the sample 
space. Repeat when the second marble is drawn without replacing the first 


marble. 


2. In an experiment, die is rolled continually until a 6 appears, at which point 
the experiment stops. What is the sample space of this experiment? Let FE, 
denote the event that n rolls are necessary to complete the experiment. What 


Cc 
(oe) 

points of the sample space are contained in E,,? What is ( U En) ? 
1 
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3. Two dice are thrown. Let E be the event that the sum of the dice is odd, let 
F be the event that at least one of the dice lands on 1, and let G be the event 
that the sum is 5. Describe the events EF,E U F,FG,EF‘, and EFG. 
4. A, B, and C take turns flipping a coin. The first one to get a head wins. The 
sample space of this experiment can be defined by 

1,01,001,0001...., 
oa 


a. Interpret the sample space. 
b. Define the following events in terms of S: 10pt 


i. A wins =A. 
ii. B wins = B. 
iii, (AU BY. 


Assume that A flips first, then B, then C, then A, and so on. 


5. A system is composed of 5 components, each of which is either working or 
failed. Consider an experiment that consists of observing the status of each 
component, and let the outcome of the experiment be given by the vector 
(X41, X2, X3, X4, Xs), where x; is equal to 1 if component i is working and is 
equal to 0 if component i is failed. 
a. How many outcomes are in the sample space of this experiment? 
b. Suppose that the system will work if components 1 and 2 are both 
working, or if components 3 and 4 are both working, or if components 
1, 3, and 5 are all working. Let W be the event that the system will 
work. Specify all the outcomes in W. 
c. Let A be the event that components 4 and 5 are both failed. How many 
outcomes are contained in the event A? 
d. Write out all the outcomes in the event AW. 


6. A hospital administrator codes incoming patients suffering gunshot wounds 
according to whether they have insurance (coding 1 if they do and 0 if they do 
not) and according to their condition, which is rated as good (g), fair (f), or 
serious (Ss). Consider an experiment that consists of the coding of such a 
patient. 
a. Give the sample space of this experiment. 
b. Let A be the event that the patient is in serious condition. Specify the 
outcomes in A. 
c. Let B be the event that the patient is uninsured. Specify the outcomes 
in B. 
d. Give all the outcomes in the event BS U A. 
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7. Consider an experiment that consists of determining the type of job—either 
blue collar or white collar—and the political affiliation Republican, Democratic, 
or Independent-—of the 15 members of an adult soccer team. How many 
outcomes are 
a. in the sample space? 
b. in the event that at least one of the team members is a blue-collar 
worker? 
c. in the event that none of the team members considers himself or 
herself an Independent? 


8. Suppose that A and B are mutually exclusive events for which P(A) = .3 
and P(B) = .5. What is the probability that 

a. either A or B occurs? 

b. A occurs but B does not? 

c. both A and B occur? 


9. A retail establishment accepts either the American Express or the VISA 
credit card. A total of 24 percent of its customers carry an American Express 
card, 61 percent carry a VISA card, and 11 percent carry both cards. What 
percentage of its customers carry a credit card that the establishment will 
accept? 
10. Sixty percent of the students at a certain school wear neither a ring nor a 
necklace. Twenty percent wear a ring and 30 percent wear a necklace. If one 
of the students is chosen randomly, what is the probability that this student is 
wearing 

a. a ring or a necklace? 

b. a ring and a necklace? 


11. A total of 28 percent of American males smoke cigarettes, 7 percent 
smoke cigars, and 5 percent smoke both cigars and cigarettes. 
a. What percentage of males smokes neither cigars nor cigarettes? 
b. What percentage smokes cigars but not cigarettes? 


12. An elementary school is offering 3 language classes: one in Spanish, one 
in French, and one in German. The classes are open to any of the 100 
students in the school. There are 28 students in the Spanish class, 26 in the 
French class, and 16 in the German class. There are 12 students who are in 
both Spanish and French, 4 who are in both Spanish and German, and 6 who 
are in both French and German. In addition, there are 2 students taking all 3 
classes. 

a. If a student is chosen randomly, what is the probability that he or she is 
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not in any of the language classes? 

b. If a student is chosen randomly, what is the probability that he or she is 
taking exactly one language class? 

c. If 2 students are chosen randomly, what is the probability that at least 1 
is taking a language class? 


13. A certain town with a population of 100,000 has 3 newspapers: I, II, and 
Ill. The proportions of townspeople who read these papers are as follows: 
I: 10 percent I and Il: 8 percent | and II and III: 1 percent 
Il: 30 percent | and III: 2 percent 
Ill: 5 percent Il and Ill: 4 percent 
(The list tells us, for instance, that 8000 people read newspapers | and II.) 
a. Find the number of people who read only one newspaper. 
b. How many people read at least two newspapers? 
c. If | and Ill are morning papers and II is an evening paper, how many 
people read at least one morning paper plus an evening paper? 
d. How many people do not read any newspapers? 
e. How many people read only one morning paper and one evening 
paper? 


14. The following data were given in a study of a group of 1000 subscribers to 
a certain magazine: In reference to job, marital status, and education, there 
were 312 professionals, 470 married persons, 525 college graduates, 42 
professional college graduates, 147 married college graduates, 86 married 
professionals, and 25 married professional college graduates. Show that the 
numbers reported in the study must be incorrect. 

Hint: Let M, W, and G denote, respectively, the set of professionals, married 
persons, and college graduates. Assume that one of the 1000 persons is 
chosen at random, and use Proposition 4.4 _ to show that if the given 
numbers are correct, then P(M UW UG) > 1. 


52 
15. If it is assumed that all ( P poker hands are equally likely, what is the 


probability of being dealt 

a. a flush? (A hand is said to be a flush if all 5 cards are of the same suit.) 

b. one pair? (This occurs when the cards have denominations a, a, b, c, d, 
where a, b, c, and d are all distinct.) 

c. two pairs? (This occurs when the cards have denominations a, a, b, b, c, 
where a, b, and c are all distinct.) 

d. three of a kind? (This occurs when the cards have denominations a, a, 
a, b, c, where a, b, and c are all distinct.) 

e. four of a kind? (This occurs when the cards have denominations a, a, a, 


89 of 848 


a, b.) 


16. Poker dice is played by simultaneously rolling 5 dice. Show that 
a. P{no two alike} = .0926; 
b. P{one pair} = .4630; 
c. P{two pair} = .2315; 
d. P{three alike} = .1543; 
e. P{full house} = .0386; 
f. P{four alike} = .0193; 
g. P{five alike} = .0008. 


17. Twenty five people, consisting of 15 women and 10 men are lined up ina 
random order. Find the probability that the ninth woman to appear is in 
position 17. That is, find the probability there are 8 women in positions 1 thru 
16 and a woman in position 17. 
18. Two cards are randomly selected from an ordinary playing deck. What is 
the probability that they form a blackjack? That is, what is the probability that 
one of the cards is an ace and the other one is either a ten, a jack, a queen, or 
a king? 
19. Two symmetric dice have had two of their sides painted red, two painted 
black, one painted yellow, and the other painted white. When this pair of dice 
is rolled, what is the probability that both dice land with the same color face 
up? 
20. Suppose that you are playing blackjack against a dealer. In a freshly 
shuffled deck, what is the probability that neither you nor the dealer is dealt a 
blackjack? 
21. A small community organization consists of 20 families, of which 4 have 
one child, 8 have two children, 5 have three children, 2 have four children, and 
1 has five children. 

a. If one of these families is chosen at random, what is the probability it 

has i children, i = 1, 2, 3, 4, 5? 
b. If one of the children is randomly chosen, what is the probability that 
child comes from a family having i children, i = 1, 2, 3, 4, 5? 


22. Consider the following technique for shuffling a deck of n cards: For any 
initial ordering of the cards, go through the deck one card at a time and at 
each card, flip a fair coin. If the coin comes up heads, then leave the card 
where it is; if the coin comes up tails, then move that card to the end of the 
deck. After the coin has been flipped n times, say that one round has been 
completed. For instance, if n = 4 and the initial ordering is 1, 2, 3, 4, then if 
the successive flips result in the outcome h,t,t,h, then the ordering at the end 
of the round is 1, 4, 2, 3. Assuming that all possible outcomes of the sequence 
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of n coin flips are equally likely, what is the probability that the ordering after 
one round is the same as the initial ordering? 

23. A pair of fair dice is rolled. What is the probability that the second die 
lands on a higher value than does the first? 

24. If two dice are rolled, what is the probability that the sum of the upturned 
faces equals i? Find it fori = 2,3, ... ,11,12. 

25. A pair of dice is rolled until a sum of either 5 or 7 appears. Find the 
probability that a 5 occurs first. 

Hint: Let £,, denote the event that a 5 occurs on the nth roll and no 5 or 7 


occurs on the first n — 1 rolls. Compute P(E,,) and argue that » P(E,,) is 
n=1 


the desired probability. 

26. The game of craps is played as follows: A player rolls two dice. If the sum 

of the dice is either a 2, 3, or 12, the player loses; if the sum is either a 7 or an 
11, the player wins. If the outcome is anything else, the player continues to roll 
the dice until she rolls either the initial outcome or a 7. If the 7 comes first, the 

player loses, whereas if the initial outcome reoccurs before the 7 appears, the 
player wins. Compute the probability of a player winning at craps. 


Hint: Let E; denote the event that the initial outcome is i and the player wins. 
12 


The desired probability is > P(E;). To compute P(E;), define the events 
i=2 
En to be the event that the initial sum is / and the player wins on the nth roll. 


Argue that P(E;)= ) P(Ein): 
n=1 
27. An urn contains 3 red and 7 black balls. Players A and B withdraw balls 
from the urn consecutively until a red ball is selected. Find the probability that 
A selects the red ball. (A draws the first ball, then B, and so on. There is no 
replacement of the balls drawn.) 
28. An urn contains 5 red, 6 blue, and 8 green balls. If a set of 3 balls is 
randomly selected, what is the probability that each of the balls will be (a) of 
the same color? (b) of different colors? Repeat under the assumption that 
whenever a ball is selected, its color is noted and it is then replaced in the urn 
before the next selection. This is known as sampling with replacement. 
29. An urn contains n white and m black balls, where n and m are positive 
numbers. 
a. If two balls are randomly withdrawn, what is the probability that they are 
the same color? 
b. If a ball is randomly withdrawn and then replaced before the second 
one is drawn, what is the probability that the withdrawn balls are the 
same color? 
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c. Show that the probability in part (b) is always larger than the one in part 
(a). 


30. The chess clubs of two schools consist of, respectively, 8 and 9 players. 
Four members from each club are randomly chosen to participate in a contest 
between the two schools. The chosen players from one team are then 
randomly paired with those from the other team, and each pairing plays a 
game of chess. Suppose that Rebecca and her sister Elise are on the chess 
clubs at different schools. What is the probability that 

a. Rebecca and Elise will be paired? 

b. Rebecca and Elise will be chosen to represent their schools but will not 

play each other? 
c. either Rebecca or Elise will be chosen to represent her school? 


31. A 3-person basketball team consists of a guard, a forward, and a center. 
a. If a person is chosen at random from each of three different such 
teams, what is the probability of selecting a complete team? 
b. What is the probability that all 3 players selected play the same 
position? 


32. A group of individuals containing b boys and g girls is lined up in random 
order; that is, each of the (b + g)! permutations is assumed to be equally 
likely. What is the probability that the person in the ith position, 1 <i<b+q, 
is a girl? 
33. A forest contains 20 elk, of which 5 are captured, tagged, and then 
released. A certain time later, 4 of the 20 elk are captured. What is the 
probability that 2 of these 4 have been tagged? What assumptions are you 
making? 
34. The second Earl of Yarborough is reported to have bet at odds of 1000 to 
1 that a bridge hand of 13 cards would contain at least one card that is ten or 
higher. (By ten or higher we mean that a card is either a ten, a jack, a queen, 
a king, or an ace.) Nowadays, we Call a hand that has no cards higher than 9 
a Yarborough. What is the probability that a randomly selected bridge hand is 
a Yarborough? 
35. Seven balls are randomly withdrawn from an urn that contains 12 red, 16 
blue, and 18 green balls. Find the probability that 

a. 3 red, 2 blue, and 2 green balls are withdrawn; 

b. at least 2 red balls are withdrawn; 

c. all withdrawn balls are the same color; 

d. either exactly 3 red balls or exactly 3 blue balls are withdrawn. 
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36. Two cards are chosen at random from a deck of 52 playing cards. What is 
the probability that they 

a. are both aces? 

b. have the same value? 


37. An instructor gives her class a set of 10 problems with the information that 
the final exam will consist of a random selection of 5 of them. If a student has 
figured out how to do 7 of the problems, what is the probability that he or she 
will answer correctly 

a. all 5 problems? 

b. at least 4 of the problems? 


38. There are n socks, 3 of which are red, in a drawer. What is the value of n 
if, when 2 of the socks are chosen randomly, the probability that they are both 
1 

red is 3 ? 
39. There are 5 hotels in a certain town. If 3 people check into hotels in a day, 
what is the probability that they each check into a different hotel? What 
assumptions are you making? 
4O. If 4 balls are randomly chosen from an urn containing 4 red, 5 white, 6 
blue, and 7 green balls, find the probability that 

a. at least one of the 4 balls chosen is green; 

b. one ball of each color is chosen. 


41. If adie is rolled 4 times, what is the probability that 6 comes up at least 
once? 

42. Two dice are thrown n times in succession. Compute the probability that 
double 6 appears at least once. How large need n be to make this probability 


t| to? 
at least =? 
Z 


43. 
a. If N people, including A and B, are randomly arranged in a line, what is 
the probability that A and B are next to each other? 
b. What would the probability be if the people were randomly arranged in 
a circle? 


44. Five people, designated as A, B, C, D, E, are arranged in linear order. 
Assuming that each possible order is equally likely, what is the probability that 
a. there is exactly one person between A and B? 
b. there are exactly two people between A and B? 
c. there are three people between A and B? 
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45. A woman has n keys, of which one will open her door. 
a. If she tries the keys at random, discarding those that do not work, what 
is the probability that she will open the door on her kth try? 
b. What if she does not discard previously tried keys? 


46. How many people have to be in a room in order that the probability that at 
a 
least two of them celebrate their birthday in the same month is at least 3 ? 


Assume that all possible monthly outcomes are equally likely. 
47. Suppose that 5 of the numbers 1, 2, ..., 14 are chosen. Find the probability 
that 9 is the third smallest value chosen. 
48. Given 20 people, what is the probability that among the 12 months in the 
year, there are 4 months containing exactly 2 birthdays and 4 containing 
exactly 3 birthdays? 
49. A group of 6 men and 6 women is randomly divided into 2 groups of size 6 
each. What is the probability that both groups will have the same number of 
men? 
50. In a hand of bridge, find the probability that you have 5 spades and your 
partner has the remaining 8. 
51. Suppose that n balls are randomly distributed into N compartments. Find 
the probability that m balls will fall into the first compartment. Assume that all 
N” arrangements are equally likely. 
52. A closet contains 10 pairs of shoes. If 8 shoes are randomly selected, 
what is the probability that there will be 

a. no complete pair? 

b. exactly 1 complete pair? 


53. If 8 people, consisting of 4 couples, are randomly arranged in a row, find 
the probability that no person is next to their partner. 
54. Compute the probability that a bridge hand is void in at least one suit. 


Note that the answer is not 
4\ (39 
1/\13 


52 
13 
(Why not?) 
Hint: Use Proposition 4.4 
55. Compute the probability that a hand of 13 cards contains 


a. the ace and king of at least one suit; 
b. all 4 of at least 1 of the 13 denominations. 


94 of 848 


56. Two players play the following game: Player A chooses one of the three 
spinners pictured in Figure 2.6 _, and then player B chooses one of the 
remaining two spinners. Both players then spin their spinner, and the one that 
lands on the higher number is declared the winner. Assuming that each 
spinner is equally likely to land in any of its 3 regions, would you rather be 
player A or player B? Explain your answer! 

Figure 2.6 Spinners 


Theoretical Exercises 


Prove the following relations: 


1.EFCECEUF. 
2. lf Ec F,then F°c E*. 
3.F =FEUFE‘S andEUF=EUE‘E. 


4. (% E,)F = 0 £,F and 
1 1 


(A e,)uF= A (BUF). 
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5. For any sequence of events E,,F>,..., define a new sequence F,,,F>,... Of 
disjoint events (that is, events such that F,F ; = @ whenever i # j) such that 
for alln = 1, 


6. Let E, F, and G be three events. Find expressions for the events so that, of 
E, F, and G, 

. only E occurs; 

. both E and G, but not F, occur; 

. at least one of the events occurs; 

. at least two of the events occur; 

. all three events occur; 

. none of the events occurs; 

. at most one of the events occurs; 


sa +4+oaqga»a#e ® 


. at most two of the events occur; 
i. exactly two of the events occur; 
j. at most three of the events occur. 


7. Use Venn diagrams 
a. to simplify the expression (E U F)(E U F‘); 
b. to prove DeMorgan’s laws for events E and F. [That is, prove 
(E UF)° = ESF‘, and (EF)° = ES UF‘ 


8. Let S be a given set. If, for some k > 0, S1,S,..., S, are mutually exclusive 


k 
nonempty subsets of Ssuch that U 5S; =S,, then we call the set 


i= 
{S1,52,5z} a partition of S. Let T,, denote the number of different partitions of 
{1,2,...n}. Thus, 7, = 1 (the only partition being S, = {1}) and T, = 2 (the two 
partitions being {{1,2,}}, {{1}, {2} ). 

a. Show, by computing all partitions, that T; = 5,7, = 15. 


b. Show that 
n 
n 
Tr=l+ > (j)r 
k=1 


and use this equation to compute To. 

Hint: One way of choosing a partition of n + 1 items is to call one of the 
items special. Then we obtain different partitions by first choosing 

k,k = 0,1,...,n, then a subset of size n — k of the nonspecial items, and 
then any of the T, partitions of the remaining k nonspecial items. By 
adding the special item to the subset of size n — k, we obtain a partition 


of all + 1 items. 


9. Suppose that an experiment is performed n times. For any event E of the 

sample space, let n(E’) denote the number of times that event E occurs and 

define f(E) = n(E)/n. Show that f(- ) satisfies Axioms 1, 2, and 3. 

10. Prove that 

P(E UF UG) = P(E) + P(F) + P(G) — P(E°FG) — P(EF‘°G) — P(EFG‘) — P(EFG‘) — 2P(EF\ 
11. If P(E) = .9 and P(F) = .8, show that P(EF) = .7. In general, prove 


Bonferroni’s inequality, namely, 
P(EF) = P(E) + P(F)-1 


12. Show that the probability that exactly one of the events E or F occurs 
equals P(E) + P(F) — 2P(EF). 
13. Prove that 
P(EF‘) = P(E) — P(EF). 

14. Prove Proposition 4.4 | by mathematical induction. 
15. An urn contains M white and N black balls. If a random sample of size r is 
chosen, what is the probability that it contains exactly k white balls? 
16. Use induction to generalize Bonferroni’s inequality to n events. That is, 
show that 

P(E,E :E,) = P(E,) +++ P(E,) — (n—- 1) 


17. Consider the matching problem, Example 5m ___, and define A, to be the 
number of ways in which the N men can select their hats so that no man 


selects his own. Argue that 
Ay = (N- 1)(Ay-1 + An-2z) 


This formula, along with the boundary conditions A, = 0, A, = 1, can then be 
solved for Ay, and the desired probability of no matches would be A, /N!. 
Hint: After the first man selects a hat that is not his own, there remain N — 1 
men to select among a set of N — 1 hats that does not contain the hat of one 
of these men. Thus, there is one extra man and one extra hat. Argue that we 
can get no matches either with the extra man selecting the extra hat or with 
the extra man not selecting the extra hat. 

18. Let f,, denote the number of ways of tossing a coin n times such that 


successive heads never appear. Argue that 
fa=Fn-itfe-. n22,wheref, =Lf,=2 


Hint: How many outcomes are there that start with a head, and how many 
start with a tail? If P,, denotes the probability that successive heads never 
appear when a coin is tossed n times, find P,, (in terms of f,,) when all 
possible outcomes of the n tosses are assumed equally likely. Compute Po. 
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19. An urn contains n red and m blue balls. They are withdrawn one at a time 
until a total of r,r < n, red balls have been withdrawn. Find the probability that 
a total of k balls are withdrawn. 

Hint: A total of k balls will be withdrawn if there are r — 1 red balls in the first 
k — 1 withdrawals and the kth withdrawal is a red ball. 

20. Consider an experiment whose sample space consists of a countably 
infinite number of points. Show that not all points can be equally likely. Can all 
points have a positive probability of occurring? 

*21. Consider Example 50 __, which is concerned with the number of runs of 
wins obtained when n wins and m losses are randomly permuted. Now 
consider the total number of runs—that is, win runs plus loss runs—and show 


that bye 
rr") 
rr) 


P{2k runs} = 2 


Self-Test Problems and Exercises 


1. A cafeteria offers a three-course meal consisting of an entree, a starch, and 
a dessert. The possible choices are given in the following table: 


Course Choices 
Entree Chicken or roast beef 
Starch Pasta or rice or potatoes 
Dessert Ice cream or Jello or apple pie or a peach 


A person is to choose one course from each category. 
a. How many outcomes are in the sample space? 
b. Let A be the event that ice cream is chosen. How many outcomes are 
in A? 
c. Let B be the event that chicken is chosen. How many outcomes are in 
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B? 
d. List all the outcomes in the event AB. 
e. Let C be the event that rice is chosen. How many outcomes are in C? 
f. List all the outcomes in the event ABC. 


2. Acustomer visiting the suit department of a certain store will purchase a 
suit with probability .22, a shirt with probability .30, and a tie with probability 
.28. The customer will purchase both a suit and a shirt with probability .11, 
both a suit and a tie with probability .14, and both a shirt and a tie with 
probability .10. A customer will purchase all 3 items with probability .06. What 
is the probability that a customer purchases 

a. none of these items? 

b. exactly 1 of these items? 


3. A deck of cards is dealt out. What is the probability that the 14th card dealt 
is an ace? What is the probability that the first ace occurs on the 14th card? 
4. Let A denote the event that the midtown temperature in Los Angeles is 70°F 
, and let B denote the event that the midtown temperature in New York is 70°F 
. Also, let C denote the event that the maximum of the midtown temperatures 
in New York and in Los Angeles is 70°F. If P(A) = .3,P(B) = .4, and P(C) =.2, 
find the probability that the minimum of the two midtown temperatures is 70°F. 
5. An ordinary deck of 52 cards is shuffled. What is the probability that the top 
four cards have 

a. different denominations? 

b. different suits? 


6. Urn A contains 3 red and 3 black balls, whereas urn B contains 4 red and 6 
black balls. If a ball is randomly selected from each urn, what is the probability 
that the balls will be the same color? 

7. In a state lottery, a player must choose 8 of the numbers from 1 to 40. The 
lottery commission then performs an experiment that selects 8 of these 40 
numbers. Assuming that the choice of the lottery commission is equally likely 


40 
to be any of the ( 3 combinations, what is the probability that a player has 


a. all 8 of the numbers selected by the lottery commission? 
b. 7 of the numbers selected by the lottery commission? 
c. at least 6 of the numbers selected by the lottery commission? 


8. From a group of 3 first-year students, 4 sophomores, 4 juniors, and 3 
seniors, a committee of size 4 is randomly selected. Find the probability that 
the committee will consist of 

a. 1 from each class; 
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b. 2 sophomores and 2 juniors; 
c. only sophomores or juniors. 


9. For a finite set A, let N(A) denote the number of elements in A. 


a. Show that 
N(A UB) = N(A) + N(B) — N(AB) 


b. More generally, show that 


n(, U Ai) = D,NAd ~ > >. NAD 


i<j 


te + (= 1)" *N(Ay An) 


10. Consider an experiment that consists of 6 horses, numbered 1 through 6, 
running a race, and suppose that the sample space consists of the 6! possible 
orders in which the horses finish. Let A be the event that the number-1 horse 
is among the top three finishers, and let B be the event that the number-2 
horse comes in second. How many outcomes are in the event A U B? 

11. A 5-card hand is dealt from a well-shuffled deck of 52 playing cards. What 
is the probability that the hand contains at least one card from each of the four 
suits? 

12. A basketball team consists of 6 frontcourt and 4 backcourt players. If 
players are divided into roommates at random, what is the probability that 
there will be exactly two roommate pairs made up of a backcourt and a 
frontcourt player? 

13. Suppose that a person chooses a letter at random fom RESERVE 
and then chooses one at random from VERTICAL. What is the probability 
that the same letter is chosen? 

14. Prove Boole’s inequality: 


pu A) < S P(A;) 
(,Ba}s d 


i 
16. Let T,(n) denote the number of partitions of the set {1,....n} into k 
nonempty subsets, where 1 < k < n. (See Theoretical Exercise 8 forthe 


[oe] 
15. Show that if P(A;) = 1 for all i > 1, then ( a) Ai) =A, 
=1 


definition of a partition.) Argue that 
T,(n) = kT,(n— 1) + Ty-1(n— 1) 


Hint: In how many partitions is {1} a subset, and in how many is 1 an element 


of a subset that contains other elements? 
17. Five balls are randomly chosen, without replacement, from an urn that 
contains 5 red, 6 white, and 7 blue balls. Find the probability that at least one 
ball of each color is chosen. 
18. Four red, 8 blue, and 5 green balls are randomly arranged in a line. 

a. What is the probability that the first 5 balls are blue? 

b. What is the probability that none of the first 5 balls is blue? 

c. What is the probability that the final 3 balls are of different colors? 

d. What is the probability that all the red balls are together? 


19. Ten cards are randomly chosen from a deck of 52 cards that consists of 
13 cards of each of 4 different suits. Each of the selected cards is put in one of 
4 piles, depending on the suit of the card. 
a. What is the probability that the largest pile has 4 cards, the next largest 
has 3, the next largest has 2, and the smallest has 1 card? 
b. What is the probability that two of the piles have 3 cards, one has 4 
cards, and one has no cards? 


20. Balls are randomly removed from an urn initially containing 20 red and 10 
blue balls. What is the probability that all of the red balls are removed before 
all of the blue ones have been removed? 


Chapter 3 Conditional Probability and 
Independence 
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3.1 Introduction 


In this chapter, we introduce one of the most important concepts in probability theory, 
that of conditional probability. The importance of this concept is twofold. In the first 
place, we are often interested in calculating probabilities when some partial 
information concerning the result of an experiment is available; in such a situation, 
the desired probabilities are conditional. Second, even when no partial information is 
available, conditional probabilities can often be used to compute the desired 
probabilities more easily. 


3.2 Conditional Probabilities 


Suppose that we toss 2 dice, and suppose that each of the 36 possible outcomes is 
1 
equally likely to occur and hence has probability 36 Suppose further that we 


observe that the first die is a 3. Then, given this information, what is the probability 
that the sum of the 2 dice equals 8? To calculate this probability, we reason as 
follows: Given that the initial die is a 3, there can be at most 6 possible outcomes of 
our experiment, namely, (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and (3, 6). Since each of 
these outcomes originally had the same probability of occurring, the outcomes 
should still have equal probabilities. That is, given that the first die is a 3, the 
(conditional) probability of each of the outcomes (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), and 


1 
(3, 6) is é! whereas the (conditional) probability of the other 30 points in the sample 


1 
space is 0. Hence, the desired probability will be S 


If we let E and F denote, respectively, the event that the sum of the dice is 8 and the 
event that the first die is a 3, then the probability just obtained is called the 
conditional probability that E occurs given that F has occurred and is denoted by 


P(E|F) 


A general formula for P(E|F) that is valid for all events E and F is derived in the same 
manner: If the event F occurs, then, in order for E to occur, it is necessary that the 
actual occurrence be a point both in E and in F; that is, it must be in EF. Now, since 
we know that F has occurred, it follows that F becomes our new, or reduced, sample 
space; hence, the probability that the event EF occurs will equal the probability of EF 
relative to the probability of F. That is, we have the following definition. 


Definition 
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If P(F) > 0, then 


(2.1) 


P(E|F) = a 


Example 2a 


Joe is 80 percent certain that his missing key is in one of the two pockets of his 
hanging jacket, being 40 percent certain it is in the left-hand pocket and 40 
percent certain it is in the right-hand pocket. If a search of the left-hand pocket 
does not find the key, what is the conditional probability that it is in the other 
pocket? 


Solution 


If we let L be the event that the key is in the left-hand pocket of the jacket, and R 
be the event that it is in the right-hand pocket, then the desired probability 
P(R|L‘) can be obtained as follows: 


P(RL*) 
P(L‘) 

P(R) 
TPQ) 
= 2/3 


P(R|L*) 


If each outcome of a finite sample space S is equally likely, then, conditional on the 
event that the outcome lies in a subset F c S, all outcomes in F become equally 
likely. In such cases, it is often convenient to compute conditional probabilities of the 
form P(E| F) by using F as the sample space. Indeed, working with this reduced 
sample space often results in an easier and better understood solution. Our next two 
examples illustrate this point. 


Example 2b 


A coin is flipped twice. Assuming that all four points in the sample space 

S = {(h,h), (h,t), (t, h), (t, t)} are equally likely, what is the conditional probability 
that both flips land on heads, given that (a) the first flip lands on heads? (b) at 
least one flip lands on heads? 


Solution 


Let B = {(h,h)} be the event that both flips land on heads; let F = {(h,h), (h,t)} 
be the event that the first flip lands on heads; and let A = {(h,h), (h,t), (t,h)} be 
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the event that at least one flip lands on heads. The probability for (a) can be 
obtained from 


P(BF) 
P(F) 


P({(h, h)}) 
P({(h,h), (h, £)}) 
1/4 


= rT aa 


P(B|F) 


For (b), we have 


P(BA) 
P(A) 


P({(h, h)}) 
P({(h, h), (ht), (& A)}) 
1/4 


= a 


P(B|A) 


Thus, the conditional probabilityhat both flips land on heads given that the first 
one does is 1/2, whereas the conditional probability that both flips land on heads 
given that at least one does is only 1/3. Many students initially find this latter 
result surprising. They reason that given that at least one flip lands on heads, 
there are two possible results: Either they both land on heads or only one does. 
Their mistake, however, is in assuming that these two possibilities are equally 
likely. Initially there are 4 equally likely outcomes. Because the information that at 
least one flip lands on heads is equivalent to the information that the outcome is 
not (t,t), we are left with the 3 equally likely outcomes (h, h), (h,t), (t,h), only one 
of which results in both flips landing on heads. 


Example 2c 


In the card game bridge, the 52 cards are dealt out equally to 4 players—called 
East, West, North, and South. If North and South have a total of 8 spades among 
them, what is the probability that East has 3 of the remaining 5 spades? 


Solution 


Probably the easiest way to compute the desired probability is to work with the 
reduced sample space. That is, given that North-South have a total of 8 spades 
among their 26 cards, there remains a total of 26 cards, exactly 5 of them being 
spades, to be distributed among the East-West hands. Since each distribution is 
equally likely, it follows that the conditional probability that East will have exactly 
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3 spades among his or her 13 cards is 
5\/21 
3/\10 

26 

13 


Multiplying both sides of Equation (2.1) by P(F), we obtain 


= 339 


(2.2) 

P(EF) = P(F)P(E|F) 
In words, Equation (2.2) states that the probability that both E and F occur is 
equal to the probability that F occurs multiplied by the conditional probability of FE 


given that F occurred. Equation (2.2) is often quite useful in computing the 
probability of the intersection of events. 


Example 2d 
Celine is undecided as to whether to take a French course or a chemistry course. 


1 
She estimates that her probability of receiving an A grade would be 3 in a French 


2 
course and 3 in a chemistry course. If Celine decides to base her decision on the 


flip of a fair coin, what is the probability that she gets an A in chemistry? 


Solution 


Let C be the event that Celine takes chemistry and A denote the event that she 
receives an A in whatever course she takes, then the desired probability is P 
(CA), which is calculated by using Equation (2.2) as follows: 


P(CA) = P(C)P(A|C) 


(3)(5)=3 


Example 2e 


Suppose that an urn contains 8 red balls and 4 white balls. We draw 2 balls from 
the urn without replacement. (a) If we assume that at each draw, each ball in the 
urn is equally likely to be chosen, what is the probability that both balls drawn are 
red? (b) Now suppose that the balls have different weights, with each red ball 
having weight r and each white ball having weight w. Suppose that the 
probability that a given ball in the urn is the next one selected is its weight divided 


by the sum of the weights of all balls currently in the urn. Now what is the 
probability that both balls are red? 


Solution 


Let R, and R, denote, respectively, the events that the first and second balls 
drawn are red. Now, given that the first ball selected is red, there are 7 remaining 


7 8 
red balls and 4 white balls, so P(R,|R,) = ii: As P(R,) is clearly Ta’ the desired 
probability is 


P(R,R;) P(Ry)P(R2| Rx) 


Na)-8 


8 12 
Of course, this probability could have been computed by P( RRs] = (yi 5 } 


For part (b), we again let R; be the event that the ith ball chosen is red and use 


P(R,R2) = P(Ry)P(RoIR1) 


Now, number the red balls, and let B;,i = 1,...,8 be the event that the first ball 
drawn is red ball number i. Then 


T 
8r + 4w 


IIo 


P(R,) = P( Us Bi) = © pees 


i 
Moreover, given that the first ball is red, the urn then contains 7 red and 4 white 
balls. Thus, by an argument similar to the preceding one, 


7r 


PURaIR) = a ay 


Hence, the probability that both balls are red is 


A eee 8r 77 
v2) Br +4w 7r+4w 


A generalization of Equation (2.2) _, which provides an expression for the 
probability of the intersection of an arbitrary number of events, is sometimes referred 


to as the multiplication rule. 


The multiplication rule 
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P(E, E2E3--E,) = P(E,)P(E2|E1)P(E3|E,E2)--P(En|E1En-1) 


In words, the multiplication rule states that P(E, £,---E,,), the probability that all of the 
events E,, £2, ..., En occur, is equal to P(E,), the probability that E, occurs, multiplied 
by P(E,|E,), the conditional probability that FE, occurs given that E, has occurred, 
multiplied by P(E; |£,E,), the conditional probability that E, occurs given that both E, 
and E, have occurred, and so on. 


To prove the multiplication rule, just apply the definition of conditional probability to its 
right-hand side, giving 


eo Dp 


P(E,E,) P(E,E,E3) P(E,E2°--E,) 
P(e) eee ee P(E.E,"En-1) (azo 


Example 2f 


In the match problem stated in Example 5m of Chapter 2 ___, it was shown 
that Py, the probability that there are no matches when N people randomly select 
from among their own N hats, is given by 


N 
Py= ) (-04/il 
i=0 


What is the probability that exactly k of the N people have matches? 


Solution 


Let us fix our attention on a particular set of k people and determine the 
probability that these k individuals have matches and no one else does. Letting F 
denote the event that everyone in this set has a match, and letting G be the event 
that none of the other N — k people have a match, we have 


P(EG) = P(E)P(G|E) 


Now, let F;,i = 1,...,k, be the event that the ith member of the set has a match. 
Then 


P(E) = P(F,F2--Fx) 
= P(F,)P(F2|Fi)P(F3|FiF2)--P(Fe| Fi-FR-1) 
1 1 1 1 
~ NN-1N—-2 N—-k+1 
_ (N=)! 
N! 


Given that everyone in the set of k has a match, the other N — k people will be 
randomly choosing among their own N — k hats, so the probability that none of 
them has a match is equal to the probability of no matches in a problem having 
N — k people choosing among their own N — k hats. Therefore, 


N —k 


P(GIE)=Py-r= Y (-1)'/i! 
i=0 


showing that the probability that a specified set of k people have matches and no 
one else does is 


(66) =" yg 


Because there will be exactly k matches if the preceding is true for any of the 


N 
(1) sets of k individuals, the desired probability is 


N 
P(exactlyk matches) = ({,)e( es) 


Py—x/K! 


R 


e1/k! when N is large 


Example 2g 


An ordinary deck of 52 playing cards is randomly divided into 4 piles of 13 cards 
each. Compute the probability that each pile has exactly 1 ace. 


Solution 


Define events E;,i = 1, 2,3, 4, as follows: 


E, = {theace of spades is in any one of the piles} 
E, = {the ace of spades and the ace of hearts are in different piles} 
E, = {the aces of spades, hearts, and diamonds are all in different piles} 


ty 
nN 
ll 


{all 4 aces are in different piles} 


The desired probability is P(E,E.E3E,), and by the multiplication rule, 


P(E, E2E3E4) = P(E,)P(E2 |E,)P(E3 |E,E2)P(E4 | E,E2E3) 


Now, 
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PE =i 


since E, is the sample space S. To determine P(E, | E,), consider the pile that 
contains the ace of spades. Because its remaining 12 cards are equally likely to 
be any 12 of the remaining 51 cards, the probability that the ace of hearts is 
among them is 12/51, giving that 


P(E,|E.) =1-2— == 
ae Gd BL 
Also, given that the ace of spades and ace of hearts are in different piles, it 
follows that the set of the remaining 24 cards of these two piles is equally likely to 
be any set of 24 of the remaining 50 cards. As the probability that the ace of 
diamonds is one of these 24 is 24/50, we see that 

24 26 


P(E =1-—=— 
(Es|E.E2) =1-= == 


Because the same logic as used in the preceding yields that 


P(E,|E,E,E3) =1 aa 
4171%2%3) — 49 +49 


the probability that each pile has exactly 1 ace is 


39-26-13 
P F,E,E3E, = 51-50-49 =~ .105 


That is, there is approximately a 10.5 percent chance that each pile will contain 
an ace. (Problem 13 gives another way of using the multiplication rule to 
solve this problem.) 


Example 2h 


Four of the eight teams in the quarterfinal round of the 2016 European 
Champions League Football (soccer) tournament were the acknowledged strong 
teams Barcelona, Bayern Munich, Real Madrid, and Paris St-Germain. The 
pairings in this round are supposed to be totally random, in the sense that all 
possible pairings are equally likely. Assuming this is so, find the probability that 
none of the strong teams play each other in this round. (Surprisingly, it seems to 
be a common occurrence in this tournament that, even though the pairings are 
supposedly random, the very strong teams are rarely matched against each 
other in this round.) 
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Solution 


If we number the four strong teams 1 through 4, and then let W;,i = 1,2, 3,4, be 
the event that team i plays one of the four weak teams, then the desired 
probability is P(W,W.,W3W, ). By the multiplication rule 


P(W,W,W3W,) = P(W,)P(W2|W,)P(W3|W,W2)P(W4|WiW2Ws) 


= (4/7)(3/5)(2/3)() 
8/35 


The preceding follows by first noting that because team 1 is equally likely to be 
matched with any of the other 7 teams, we have that P(W,) = 4/7. Now, given 
that W, occurs, team 2 is equally likely to be matched with any of five teams: 
namely, teams 3, 4, or any of the three weak teams not matched with team 1. As 
three of these five teams are weak, we see that P(W,|W,) = 3/5. Similarly, 
given that events W, and W, have occurred, team 3 is equally likely to be 
matched with any from a set of three teams, consisting of team 4 and the 
remaining two weaker teams not matched with 1 or 2. Hence, 

P(W3|W,W,) = 2/3. Finally, given that W,, W,, and W; all occur, team 4 will be 
matched with the remaining weak team not matched with any of 1, 2,3, giving 
that P(W,|W,W,W;) = 1. 


Remarks Our definition of P(E | F) is consistent with the interpretation of probability 
as being a long-run relative frequency. To see this, suppose that n repetitions of the 
experiment are to be performed, where n is large. We claim that if we consider only 
those experiments in which F occurs, then P(E | F) will equal the long-run proportion 
of them in which E also occurs. To verify this statement, note that since P(F) is the 
long-run proportion of experiments in which F occurs, it follows that in the n 
repetitions of the experiment, F will occur approximately nP(F) times. Similarly, in 
approximately nP(EF) of these experiments, both E and F will occur. Hence, out of 
the approximately nP(F) experiments in which F occurs, the proportion of them in 
which E also occurs is approximately equal to 


nP(EF) _ P(EF) 
nP(F) ~~ -P(F) 


Because this approximation becomes exact as n becomes larger and larger, we have 
the appropriate definition of P(E | F). 


3.3 Bayes’s Formula 
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Let E and F be events. We may express E as 


E = EFU EF* 


for, in order for an outcome to be in £, it must either be in both E and F or be in E but 
not in F. (See Figure 3.1.) As EF and EF“ are clearly mutually exclusive, we 
have, by Axiom 3, 
(3.1) 

P(EF) + P(EF‘) 

P(E|F)P(F) + P(E|F°)P(F‘) 


P(E|F)P(F) + P(E|F°)[1 — P(F)] 


Figure 3.1 FE = EF UEF‘. EF = Shaded Area; EF‘ = Striped Area. 


E I 


Equation (3.1) — states that the probability of the event FE is a weighted average of 
the conditional probability of E given that F has occurred and the conditional 
probability of E given that F has not occurred—each conditional probability being 
given as much weight as the event on which it is conditioned has of occurring. This is 
an extremely useful formula, because its use often enables us to determine the 
probability of an event by first “conditioning” upon whether or not some second event 
has occurred. That is, there are many instances in which it is difficult to compute the 
probability of an event directly, but it is straightforward to compute it once we know 
whether or not some second event has occurred. We illustrate this idea with some 
examples. 


Example 3a 


(Part 1) 


An insurance company believes that people can be divided into two classes: 
those who are accident prone and those who are not. The company’s statistics 
show that an accident-prone person will have an accident at some time within a 
fixed 1-year period with probability .4, whereas this probability decreases to .2 for 
a person who is not accident prone. If we assume that 30 percent of the 


population is accident prone, what is the probability that a new policyholder will 
have an accident within a year of purchasing a policy? 


Solution 


We shall obtain the desired probability by first conditioning upon whether or not 

the policyholder is accident prone. Let A, denote the event that the policyholder 
will have an accident within a year of purchasing the policy, and let A denote the 
event that the policyholder is accident prone. Hence, the desired probability is 


given by 
P(A,) = P(A,|A)P(A) + P(A, |A9P(A) 
= (.4)(.3) +(.2)(.7) = .26 
Example 3a 
(Part 2) 


Suppose that a new policyholder has an accident within a year of purchasing a 
policy. What is the probability that he or she is accident prone? 


Solution 


The desired probability is 


(Alay) = SY 
P(A)P(A, |A) 
P(A) 

_ ©3)C4)_ 6 
= 26 ~= 13 


Example 3b 


Consider the following game played with an ordinary deck of 52 playing cards: 
The cards are shuffled and then turned over one at a time. At any time, the 
player can guess that the next card to be turned over will be the ace of spades; if 
it is, then the player wins. In addition, the player is said to win if the ace of 
spades has not yet appeared when only one card remains and no guess has yet 
been made. What is a good strategy? What is a bad strategy? 


Solution 


Every strategy has probability 1/52 of winning! To show this, we will use induction 
to prove the stronger result that for an n card deck, one of whose cards is the ace 
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of spades, the probability of winning is 1/n, no matter what strategy is employed. 
Since this is clearly true for n = 1, assume it to be true for an n — 1 card deck, 
and now consider an n card deck. Fix any strategy, and let p denote the 
probability that the strategy guesses that the first card is the ace of spades. 
Given that it does, the player’s probability of winning is 1/n. If, however, the 
strategy does not guess that the first card is the ace of spades, then the 
probability that the player wins is the probability that the first card is not the ace 
of spades, namely, (n — 1) /n, multiplied by the conditional probability of winning 
given that the first card is not the ace of spades. But this latter conditional 
probability is equal to the probability of winning when using an n — 1 card deck 
containing a single ace of spades; it is thus, by the induction hypothesis, 

1/(n — 1). Hence, given that the strategy does not guess the first card, the 
probability of winning is 


Thus, letting G be the event that the first card is guessed, we obtain 


P{win} = P{win|G}P(G) + P{win|G4(1 — P(G)) = “p + “(1 —p) 


1 


n 


Example 3c 


In answering a question on a multiple-choice test, a student either knows the 
answer or guesses. Let p be the probability that the student knows the answer 
and 1 — p be the probability that the student guesses. Assume that a student who 
guesses at the answer will be correct with probability 1/m, where m is the 
number of multiple-choice alternatives. What is the conditional probability that a 
student knew the answer to a question given that he or she answered it 
correctly? 


Solution 


Let C and K denote, respectively, the events that the student answers the 
question correctly and the event that he or she actually knows the answer. Now, 
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P(KC) 
P(C) 


P(K|C) 


P(C|K)P(K) 
P(C|K)P(K) + P(C|K°)P(K‘) 
p 
p+ (1/m)(1 —p) 
mp 
1+(m-1)p 


1 
For example, ifm = 5,p = 3° then the probability that the student knew the 


gio 
answer to a question he or she answered correctly is S 


Example 3d 


A laboratory blood test is 95 percent effective in detecting a certain disease when 
it is, in fact, present. However, the test also yields a “false positive” result for 1 
percent of the healthy persons tested. (That is, if a healthy person is tested, then, 
with probability .01, the test result will imply that he or she has the disease.) If .5 
percent of the population actually has the disease, what is the probability that a 
person has the disease given that the test result is positive? 


Solution 


Let D be the event that the person tested has the disease and EF the event that 
the test result is positive. Then the desired probability is 


P(DE) 
P(E) 
P(E|D)P(D) 
P(E|D)P(D) + P(E|D°)P(D‘) 
(.95)(.005) 
(.95)(.005) + (.01)(.995) 


P(D|E) 


ae 323 
~ 294° ° 


Thus, only 32 percent of those persons whose test results are positive actually 
have the disease. Many students are often surprised at this result (they expect 
the percentage to be much higher, since the blood test seems to be a good one), 
so it is probably worthwhile to present a second argument that, although less 
rigorous than the preceding one, is probably more revealing. We now do so. 


Since .5 percent of the population actually has the disease, it follows that, on the 
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average, 1 person out of every 200 tested will have it. The test will correctly 
confirm that this person has the disease with probability .95. Thus, on the 
average, out of every 200 persons tested, the test will correctly confirm that .95 
person has the disease. On the other hand, out of the (on the average) 199 
healthy people, the test will incorrectly state that (199)(.01) of these people have 
the disease. Hence, for every .95 diseased persons that the test correctly states 
is ill, there are (on the average) (199)(.01) healthy persons who the test 
incorrectly states are ill. Thus, the proportion of time that the test result is correct 
when it states that a person is ill is 


95 95 


—___—_____=_"_ x ,323 
95 + (199)(.01) 294 


Equation (3.1) is also useful when one has to reassess one’s personal 
probabilities in the light of additional information. For instance, consider the 
examples that follow. 


Example 3e 


Consider a medical practitioner pondering the following dilemma: “If I’m at least 
80 percent certain that my patient has this disease, then | always recommend 
surgery, whereas if I’m not quite as certain, then | recommend additional tests 
that are expensive and sometimes painful. Now, initially | was only 60 percent 
certain that Jones had the disease, so | ordered the series A test, which always 
gives a positive result when the patient has the disease and almost never does 
when he is healthy. The test result was positive, and | was all set to recommend 
surgery when Jones informed me, for the first time, that he was diabetic. This 
information complicates matters because, although it doesn’t change my original 
60 percent estimate of his chances of having the disease in question, it does 
affect the interpretation of the results of the A test. This is so because the A test, 
while never yielding a positive result when the patient is healthy, does 
unfortunately yield a positive result 30 percent of the time in the case of diabetic 
patients who are not suffering from the disease. Now what do | do? More tests or 
immediate surgery?” 


Solution 


In order to decide whether or not to recommend surgery, the doctor should first 
compute her updated probability that Jones has the disease given that the A test 
result was positive. Let D denote the event that Jones has the disease and E the 
event that the A test result is positive. The desired conditional probability is then 
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P(DE) 
P(E) 


P(D|E) 


P(D)P(E|D) 
P(E|D)P(D) + P(E|D°)P(D‘) 


(.6)1 
1(.6) +(.3)(.4) 
= .833 


Note that we have computed the probability of a positive test result by 
conditioning on whether or not Jones has the disease and then using the fact that 
because Jones is a diabetic, his conditional probability of a positive result given 
that he does not have the disease, P(E| D°), equals .3. Hence, as the doctor 
should now be more than 80 percent certain that Jones has the disease, she 
should recommend surgery. 


Example 3f 


At a certain stage of a criminal investigation, the inspector in charge is 60 percent 
convinced of the guilt of a certain suspect. Suppose, however, that a new piece 
of evidence which shows that the criminal has a certain characteristic (Such as 
left-handedness, baldness, or brown hair) is uncovered. If 20 percent of the 
population possesses this characteristic, how certain of the guilt of the suspect 
should the inspector now be if it turns out that the suspect has the characteristic? 


Solution 


Letting G denote the event that the suspect is guilty and C the event that he 
possesses the characteristic of the criminal, we have 


P(G|C) = a 
P(C|G)P(G) 
P(C| G)P(G) + P(C|G‘)P(G‘) 
1(.6) 
1(.6) + (.2)..4) 
= 882 


where we have supposed that the probability of the suspect having the 
characteristic if he is, in fact, innocent is equal to .2, the proportion of the 
population possessing the characteristic. 


Example 3g 
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In the world bridge championships held in Buenos Aires in May 1965, the famous 
British bridge partnership of Terrence Reese and Boris Schapiro was accused of 
cheating by using a system of finger signals that could indicate the number of 
hearts held by the players. Reese and Schapiro denied the accusation, and 
eventually a hearing was held by the British bridge league. The hearing was in 
the form of a legal proceeding with prosecution and defense teams, both having 
the power to call and cross-examine witnesses. During the course of the 
proceeding, the prosecutor examined specific hands played by Reese and 
Schapiro and claimed that their playing these hands was consistent with the 
hypothesis that they were guilty of having illicit knowledge of the heart suit. At 
this point, the defense attorney pointed out that their play of these hands was 
also perfectly consistent with their standard line of play. However, the prosecution 
then argued that as long as their play was consistent with the hypothesis of guilt, 
it must be counted as evidence toward that hypothesis. What do you think of the 
reasoning of the prosecution? 


Solution 


The problem is basically one of determining how the introduction of new 
evidence (in this example, the playing of the hands) affects the probability of a 
particular hypothesis. If we let H denote a particular hypothesis (such as the 
hypothesis that Reese and Schapiro are guilty) and E the new evidence, then 


(3.2) 


P(HE) 
P(E) 


P(H|E) 


P(E|H)P(H) 
P(E|H)P(H) + P(E|H°)[1 — PCH) 


where P(H) is our evaluation of the likelinood of the hypothesis before the 
introduction of the new evidence. The new evidence will be in support of the 
hypothesis whenever it makes the hypothesis more likely—that is, whenever 
P(H|E) = P(H). From Equation (3.2) __, this will be the case whenever 


P(E|H) = P(E|H)P(#) + P(E|H‘)[1 — P(AD] 


or, equivalently, whenever 


P(E|H) > P(E|H*) 


In other words, any new evidence can be considered to be in support of a 
particular hypothesis only if its occurrence is more likely when the hypothesis is 
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true than when it is false. In fact, the new probability of the hypothesis depends 
on its initial probability and the ratio of these conditional probabilities, since, from 
Equation (3.2) _, 


P(H|E) = ws, 


P(H) + [1 — (HPO 


P(E|A) 


Hence, in the problem under consideration, the play of the cards can be 
considered to support the hypothesis of guilt only if such play would have been 
more likely if the partnership were cheating than if it were not. As the prosecutor 
never made this claim, his assertion that the evidence is in support of the guilt 
hypothesis is invalid. 


Example 3h 


Twins can be either identical or fraternal. Identical, also called monozygotic, twins 
form when a single fertilized egg splits into two genetically identical parts. 
Consequently, identical twins always have the same set of genes. Fraternal, also 
called dizygotic, twins develop when two eggs are fertilized and implant in the 
uterus. The genetic connection of fraternal twins is no more or less the same as 
siblings born at separate times. A Los Angeles County, California, scientist 
wishing to know the current fraction of twin pairs born in the county that are 
identical twins has assigned a county statistician to study this issue. The 
statistician initially requested each hospital in the county to record all twin births, 
indicating whether or not the resulting twins were identical. The hospitals, 
however, told her that to determine whether newborn twins were identical was 
not a simple task, as it involved the permission of the twins’ parents to perform 
complicated and expensive DNA studies that the hospitals could not afford. After 
some deliberation, the statistician just asked the hospitals for data listing all twin 
births along with an indication as to whether the twins were of the same sex. 
When such data indicated that approximately 64 percent of twin births were 
same-sexed, the statistician declared that approximately 28 percent of all twins 
were identical. How did she come to this conclusion? 


Solution 


The statistician reasoned that identical twins are always of the same sex, 
whereas fraternal twins, having the same relationship to each other as any pair of 
siblings, will have probability 1/2 of being of the same sex. Letting J] be the event 
that a pair of twins is identical, and SS be the event that a pair of twins is of the 
same sex, she computed the probability P(SS) by conditioning on whether the 
twin pair was identical. This gave 
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P(SS) = P(SS|D)P(I) + P(SS|I)P(I*) 
or 


P(SS) =1x P(I) + ; x [1 -P(D] = ; ‘ 5 P(t) 


which, using that P(SS) ~ .64 yielded the result 


P(1) = .28 


The change in the probability of a hypothesis when new evidence is introduced can 
be expressed compactly in terms of the change in the odds of that hypothesis, where 
the concept of odds is defined as follows. 


Definition 

The odds of an event A are defined by 
P(A) P(A) 
P(A‘) 1-—P(A) 


That is, the odds of an event A tell how much more likely it is that the event A 


3 
P(A) = 2P(A‘), so the odds are 2. If the odds are equal to a, then it is common 


2 
occurs than it is that it does not occur. For instance, if (a) = —, then 


to say that the odds are “a to 1” in favor of the hypothesis. 
Consider now a hypothesis H that is true with probability P(H), and suppose that new 
evidence E is introduced. Then, the conditional probabilities, given the evidence E, 
that H is true and that H is not true are respectively given by 


P(E|H)P(H) 
P(E) 


P(E|H‘)P(H*) 


P(H|E) = ae 


P(H‘|E) = 


Therefore, the new odds after the evidence E has been introduced are 


(3.3) 


P(H|E) _ PCH) P(ElH) 
PCHS|E) PCH) P(E|H*) 


That is, the new value of the odds of H is the old value multiplied by the ratio of the 
conditional probability of the new evidence given that H is true to the conditional 
probability given that H is not true. Thus, Equation (3.3) — verifies the result of 
Example 3f __, since the odds, and thus the probability of H, increase whenever the 
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new evidence is more likely when H is true than when it is false. Similarly, the odds 
decrease whenever the new evidence is more likely when H is false than when it is 
true. 


Example 3i 


An urn contains two type A coins and one type B coin. When a type A coin is 
flipped, it comes up heads with probability 1/4, whereas when a type B coin is 
flipped, it comes up heads with probability 3/4. A coin is randomly chosen from 
the urn and flipped. Given that the flip landed on heads, what is the probability 
that it was a type A coin? 


Solution 


Let A be the event that a type A coin was flipped, and let B = A‘ be the event that 


a type B coin was flipped. We want P(A| heads), where heads is the event that 
the flip landed on heads. From Equation (3.3) __, we see that 


P(Al|heads) P(A) P( heads| A) 
P(A‘\heads) — P(B) P(heads|B) 
_ 2/3 1/4 
~ 1/3 3/4 
= 2/3 


Hence, the odds are 2/3:1, or, equivalently, the probability is 2/5 that a type A 
coin was flipped. 


Equation (3.1) | may be generalized as follows: Suppose that F,, F>,...,F, are 
mutually exclusive events such that 


and using the fact that the events EF;,i = 1, ...,n are mutually exclusive, we obtain 
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(3.4) 


P(E) = > P(EF,) 


i=1 


n 


>, PELF PCF) 


i=1 


Thus, Equation (3.4) __, often referred to as the /aw of total probability, shows how, 
for given events F,, F2, ..., Fn, of which one and only one must occur, we can compute 
P(E) by first conditioning on which one of the F; occurs. That is, Equation (3.4) 
states that P(E) is equal to a weighted average of P(E| F;), each term being 
weighted by the probability of the event on which it is conditioned. 


Example 3j 


In Example 5j ofChapter2  ,weconsidered the probability that, for a 
randomly shuffled deck, the card following the first ace is some specified card, 


1 
and we gave a combinatorial argument to show that this probability is 52 Here 


is a probabilistic argument based on conditioning: Let E' be the event that the 
card following the first ace is some specified card, say, card x. To compute P(E), 
we ignore card x and condition on the relative ordering of the other 57 cards in 
the deck. Letting O be the ordering gives 


PLE =) P(E|0)P(0) 
(0) 


Now, given O, there are 52 possible orderings of the cards, corresponding to 
having card x being the ith card in the deck, i = 1,...,52. But because all 52! 
possible orderings were initially equally likely, it follows that, conditional on O, 
each of the 52 remaining possible orderings is equally likely. Because card x will 
follow the first ace for only one of these orderings, we have P(E|O) = 1/52, 
implying that P(E) = 1/52. 


Again, let F;,...,.F,, be a set of mutually exclusive and exhaustive events (meaning 
that exactly one of these events must occur). 


Suppose now that FE has occurred and we are interested in determining which one of 
the F; also occurred. Then, by Equation (3.4) _, we have the following proposition. 


Proposition 3.1 


(3.5) 


P(EF;) 


P(E|F;)P(F;) 


n 


>, PELFDPCF,) 


i=1 


Equation (3.5) — is known as Bayes’s formula, after the English philosopher 
Thomas Bayes. If we think of the events F; as being possible “hypotheses” about 
some subject matter, then Bayes’s formula may be interpreted as showing us how 
opinions about these hypotheses held before the experiment was carried out [that is, 
the P(F;)] should be modified by the evidence produced by the experiment. 


Example 3k 


A plane is missing, and it is presumed that it was equally likely to have gone 
down in any of 3 possible regions. Let 1 — £,,i = 1,2, 3, denote the probability 
that the plane will be found upon a search of the ith region when the plane is, in 
fact, in that region. (The constants £, are called overlook probabilities, because 
they represent the probability of overlooking the plane; they are generally 
attributable to the geographical and environmental conditions of the regions.) 
What is the conditional probability that the plane is in the ith region given that a 
search of region 1 is unsuccessful? 


Solution 


Let R;,i = 1,2,3, be the event that the plane is in region i, and let E be the event 
that a search of region 1 is unsuccessful. From Bayes’s formula, we obtain 
P(ER,) 
P(E) 
P(E|R,)P(R;) 
i = 1 PIR) PCRi) 


1 

(8,)3 
1 1 1 
(6,) 3 + (1) 3 + (3 


a 
B, +2 


P(R,|E) 


For j = 2, 3, 
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P(E|Rj)P(Rj) 


Note that the updated (that is, the conditional) probability that the plane is in 
region j, given the information that a search of region 1 did not find it, is greater 
than the initial probability that it was in region j when j # 1 and is less than the 
initial probability when j = 1. This statement is certainly intuitive, since not finding 
the plane in region 1 would seem to decrease its chance of being in that region 
and increase its chance of being elsewhere. Further, the conditional probability 
that the plane is in region 1 given an unsuccessful search of that region is an 
increasing function of the overlook probability 6,. This statement is also intuitive, 
since the larger f, is, the more it is reasonable to attribute the unsuccessful 
search to “bad luck” as opposed to the plane’s not being there. Similarly, 
P(R;|E), j # 1, is a decreasing function of £,. 


The next example has often been used by unscrupulous probability students to win 
money from their less enlightened friends. 


Example 31 


Suppose that we have 3 cards that are identical in form, except that both sides of 
the first card are colored red, both sides of the second card are colored black, 
and one side of the third card is colored red and the other side black. The 3 cards 
are mixed up in a hat, and 1 card is randomly selected and put down on the 
ground. If the upper side of the chosen card is colored red, what is the probability 
that the other side is colored black? 


Solution 


Let RR, BB, and RB denote, respectively, the events that the chosen card is all 
red, all black, or the red—black card. Also, let R be the event that the upturned 
side of the chosen card is red. Then, the desired probability is obtained by 
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P(RB OR) 
P(R) 


P(RB|R) 


P(R|RB)P(RB) 
P(R|RR)P(RR) + P(R| RB)P(RB) + P(R|BB)P(BB) 


((3)+C)Ga) +o). ° 


1 1 
Hence, the answer is 3° Some students guess 5 as the answer by incorrectly 


reasoning that given that a red side appears, there are two equally likely 
possibilities: that the card is the all-red card or the red—black card. Their mistake, 
however, is in assuming that these two possibilities are equally likely. For, if we 
think of each card as consisting of two distinct sides, then we see that there are 6 
equally likely outcomes of the experiment—namely, R,, Rz, B,, Bz, R3, B3—-where 
the outcome is R, if the first side of the all-red card is turned face up, R, if the 
second side of the all-red card is turned face up, R; if the red side of the red— 
black card is turned face up, and so on. Since the other side of the upturned red 
side will be black only if the outcome is R3, we see that the desired probability is 
the conditional probability of R, given that either R, or R, or R3 occurred, which 


1 
obviously equals 3° 


Example 3m 


A new couple, known to have two children, has just moved into town. Suppose 
that the mother is encountered walking with one of her children. If this child is a 
girl, what is the probability that both children are girls? 


Solution 


Let us start by defining the following events: 
G,: The first (that is, the oldest) child is a girl. 
G2: The second child is a girl. 

G: The child seen with the mother is a girl. 


Also, let B,,B,, and B denote similar events, except that “girl” is replaced by 
“poy.” Now, the desired probability is P(G,G, |G), which can be expressed as 
follows: 


P(G,GG) 


P(G,G2 |G) 


P(G) 
P(G,G,) 
P(G) 
Also, 
P(G) = P(G|G,G,)P(G,G,) + P(G|G,Bz)P(G,B) + P(G|B,Gz)P(B,Gz) + P(G|B,B,)P(B 


= P(G,G2) + P(G|G,B,)P(G,B,) + P(G|B,Gz)P(B,G2) 


where the final equation used the results P(G|G,G2) = 1 and P(G|B,B,) = 0. If 
we now make the usual assumption that all 4 gender possibilities are equally 
likely, then we see that 


1 


4 
P(G,G.|@) = ~-——__+—___— 
Zz + P(G|G,B,)/4+ P(G|B,G,)/4 


1 
1+ P(G|G,B,) + P(G|B,G2) 


Thus, the answer depends on whatever assumptions we want to make about the 
conditional probabilities that the child seen with the mother is a girl given the 
event G,B, and that the child seen with the mother is a girl given the event G,B,. 
For instance, if we want to assume, on the one hand, that, independently of the 
genders of the children, the child walking with the mother is the elder child with 
some probability p, then it would follow that 


P(G|G,B,) = p =1-P(G|B,G,) 
implying under this scenario that 


1 
P(G,G2/6) = 5 


If, on the other hand, we were to assume that if the children are of different 
genders, then the mother would choose to walk with the girl with probability q, 
independently of the birth order of the children, then we would have 


P(G|G,B) = P(G|B,G2) =q 


implying that 
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P(G,G2|G) = T+2q 


For instance, if we took q = 1, meaning that the mother would always choose to 
walk with a daughter, then the conditional probability that she has two daughters 


1 
would be 3° which is in accord with Example 2b —_ because seeing the mother 


with a daughter is now equivalent to the event that she has at least one daughter. 


Hence, as stated, the problem is incapable of solution. Indeed, even when the 
usual assumption about equally likely gender probabilities is made, we still need 
to make additional assumptions before a solution can be given. This is because 
the sample space of the experiment consists of vectors of the form s,, 52, i, where 
s, is the gender of the older child, sz is the gender of the younger child, and i 
identifies the birth order of the child seen with the mother. As a result, to specify 
the probabilities of the events of the sample space, it is not enough to make 
assumptions only about the genders of the children; it is also necessary to 
assume something about the conditional probabilities as to which child is with the 
mother given the genders of the children. 


Example 3n 


A bin contains 3 types of disposable flashlights. The probability that a type 1 
flashlight will give more than 100 hours of use is .7, with the corresponding 
probabilities for type 2 and type 3 flashlights being .4 and .3, respectively. 
Suppose that 20 percent of the flashlights in the bin are type 1, 30 percent are 
type 2, and 50 percent are type 3. 


a. What is the probability that a randomly chosen flashlight will give more 
than 100 hours of use? 

b. Given that a flashlight lasted more than 100 hours, what is the conditional 
probability that it was a type j flashlight, 7 = 1,2,3 


Solution 


a. Let A denote the event that the flashlight chosen will give more than 100 
hours of use, and let F; be the event that a type j flashlight is chosen, 

j = 1,2,3. To compute P(A), we condition on the type of the flashlight, to 
obtain 

P(A) 


P(A|F,)P(F1) + P(A|F,)P(F2) + P(A|F3)P(F3) 
(I 2DeCasy+Ois)= 41 


There is a 41 percent chance that the flashlight will last for more than 100 
hours. 
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b. The probability is obtained by using Bayes’s formula: 


P( AF; 
P(ry|a) = PAD 
_ PCALE,)PCF) 
41 
Thus, 
P(F,JA) = (.7)(.2)/.41 = 14/41 


P(F,|A) = (.4)(.3)/.41 = 12/41 
P(F3| A) (.3)(.5)/.41 = 15/41 


For instance, whereas the initial probability that a type 1 flashlight is 
chosen is only .2, the information that the flashlight has lasted more than 
100 hours raises the probability of this event to 14/41 ~ .341. 


Example 30 


A crime has been committed by a solitary individual, who left some DNA at the 
scene of the crime. Forensic scientists who studied the recovered DNA noted 
that only five strands could be identified and that each innocent person, 
independently, would have a probability of 10° ° of having his or her DNA match 
on all five strands. The district attorney supposes that the perpetrator of the crime 
could be any of the 1 million residents of the town. Ten thousand of these 
residents have been released from prison within the past 10 years; consequently, 
a sample of their DNA is on file. Before any checking of the DNA file, the district 
attorney thinks that each of the 10,000 ex-criminals has probability a of being 
guilty of the new crime, whereas each of the remaining 990,000 residents has 
probability 8, where a = cf. (That is, the district attorney supposes that each 
recently released convict is c times as likely to be the crime’s perpetrator as is 
each town member who is not a recently released convict.) When the DNA that is 
analyzed is compared against the database of the 10,000 ex-convicts, it turns out 
that A. J. Jones is the only one whose DNA matches the profile. Assuming that 
the district attorney’s estimate of the relationship between a and f is accurate, 
what is the probability that A. J. is guilty? 


Solution 


To begin, note that because probabilities must sum to 1, we have 


1 = 10, 000a + 990, 0008 = (10, 000c + 990,000) 


Thus, 
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1 c 


P= 79 000c + 990,000 ~~ 10,000c + 990,000 


Now, let G be the event that A. J. is guilty, and let M denote the event that A. J. is 
the only one of the 10,000 on file to have a match. Then, 


P(GM) 
P(M) 


P(G|M) 


P(G)P(M |G) 
P(M|G)P(G) + P(M|G)P(G) 


On the one hand, if A. J. is guilty, then he will be the only one to have a DNA 
match if none of the others on file have a match. Therefore, 


P(M|G) =(1-10°5)” 


On the other hand, if A. J. is innocent, then in order for him to be the only match, 
his DNA must match (which will occur with probability 10°), all others in the 
database must be innocent, and none of these others can have a match. Now, 
given that A. J. is innocent, the conditional probability that all the others in the 
database are also innocent is 


P(allin database innocent) 
P(AJ innocent) 
1—10,000a 
1-a 


P(all others innocent|A/J innocent) 


Also, the conditional probability, given their innocence, that none of the others in 
the database will have a match is (1 — 10~°)””””. Therefore, 


9999 
1—10,000a _ 
P(M|G°) = 10 (2 —10 *) 


Because P(G) = a, the preceding formula gives 


a 1 


a +10 °(1—10,000a) an 107? 


P(G|M) = 


a 


Thus, if the district attorney’s initial thoughts were that an arbitrary ex-convict was 
100 times more likely to have committed the crime than was a nonconvict (that is, 


c = 100), then a = d 


19,900 °" 


128 of 848 


P(G|M) = = 0.9099 


1.099 


If the district attorney initially thought that the appropriate ratio was c = 10, then 
1 
“= 799,000 2" 


i 
P(G|M) = 759 ~ 0.5025 


If the district attorney initially thought that the criminal was equally likely to be any 
of the members of the town(c = 1), thena = 10 ° and 
1 


P(G|M) = a5 © 0.0917 


Thus, the probability ranges from approximately 9 percent when the district 
attorney’s initial assumption is that all the members of the population have the 
same chance of being the perpetrator to approximately 91 percent when she 
assumes that each ex-convict is 100 times more likely to be the criminal than is a 
specified townsperson who is not an ex-convict. 


3.4 Independent Events 


The previous examples in this chapter show that P(E | F), the conditional probability 
of E given F, is not generally equal to P(E), the unconditional probability of E. In other 
words, knowing that F has occurred generally changes the chances of E’s 
occurrence. In the special cases where P(E'| F) does in fact equal P(E), we say that 
E is independent of F. That is, E is independent of F if knowledge that F has occurred 
does not change the probability that E occurs. 


Since P(E| F) = P(EF)/P(F), it follows that E is independent of F if 
(4.1) 


P(EF) = P(E)P(F) 


The fact that Equation (4.1) is symmetric in E and F shows that whenever E is 
independent of F, F is also independent of E. We thus have the following definition. 
Definition 
Two events E and F are said to be independent if Equation (4.1) _ holds. 


Two events E and F that are not independent are said to be dependent. 
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Example 4a 


A card is selected at random from an ordinary deck of 52 playing cards. If E is the 
event that the selected card is an ace and F is the event that it is a spade, then EF 
4 


1 
and F are independent. This follows because (er) = 55° whereas r(«) = 55 


d P| F a2 
an ~ 50° 


Example 4b 


Two coins are flipped, and all 4 outcomes are assumed to be equally likely. If E is 
the event that the first coin lands on heads and F the event that the second lands 


: 1 
on tails, then E and F are independent, since P(EF) = P({(H,T)}) = 4’ Whereas 


P(E) = P({(H,H),(H,7)}) = 5 and P(F) = P({(H,7),(P.7)}) = 5. 


Example 4c 


Suppose that we toss 2 fair dice. Let E, denote the event that the sum of the dice 
is 6 and F denote the event that the first die equals 4. Then 


P(E.F) = P({(4,2)}) = 32 


whereas 


pee,)e(®) = (=) 2) = a3 


Hence, E', and F are not independent. Intuitively, the reason for this is clear 
because if we are interested in the possibility of throwing a 6 (with 2 dice), we 
shall be quite happy if the first die lands on 4 (or, indeed, on any of the numbers 
1, 2, 3, 4, and 5), for then we shall still have a possibility of getting a total of 6. If, 
however, the first die landed on 6, we would be unhappy because we would no 
longer have a chance of getting a total of 6. In other words, our chance of getting 
a total of 6 depends on the outcome of the first die; thus, EF, and F cannot be 
independent. 


Now, suppose that we let E, be the event that the sum of the dice equals 7. Is E, 
independent of fF? The answer is yes, since 


P(EsF) = P({(4,3)}) = 32 


whereas 


cee?) =(2)(3) = 36 


We leave it for the reader to present the intuitive argument why the event that the 
sum of the dice equals 7 is independent of the outcome on the first die. 


Example 4d 


If we let E denote the event that the next president is a Republican and F the 
event that there will be a major earthquake within the next year, then most people 
would probably be willing to assume that F and F are independent. However, 
there would probably be some controversy over whether it is reasonable to 
assume that E is independent of G, where G is the event that there will be a 
recession within two years after the election. 


We now show that if E is independent of F, then E is also independent of F°. 


Proposition 4.1 


If E and F are independent, then so are E and F‘*. 


Proof. Assume that E and F are independent. Since E = EF U EF‘ and EF and 
EF‘ are obviously mutually exclusive, we have 


P(E) = P(EF)+P(EF‘) 
= P(E)P(F) + P(EF‘) 
or, equivalently, 
P(EF) = P(£)[1-P(F)] 
= P(E)P(F*) 
and the result is proved. 


Thus, if E is independent of F, then the probability of E’s occurrence is unchanged by 
information as to whether or not F has occurred. 


Suppose now that E is independent of F and is also independent of G. Is E then 
necessarily independent of FG? The answer, somewhat surprisingly, is no, as the 
following example demonstrates. 


Example 4e 
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Two fair dice are thrown. Let E denote the event that the sum of the dice is 7. Let 
F denote the event that the first die equals 4 and G denote the event that the 
second die equals 3. From Example 4c __, we know that E is independent of F, 
and the same reasoning as applied there shows that E is also independent of G; 
but clearly, E is not independent of FG [since P(E| FG) = 1]. 


It would appear to follow from Example 4e __ that an appropriate definition of the 
independence of three events E, F, and G would have to go further than merely 


3 
assuming that all of the ()) pairs of events are independent. We are thus led to the 
following definition. 


Definition 


Three events E, F, and G are said to be independent if 


P(EFG) = P(E)P(F)P(G) 
P(EF) = P(E)P(F) 
P(EG) = P(E)P(G) 


P(FG) = P(F)P(G) 


Note that if EF, F, and G are independent, then E will be independent of any event 
formed from F and G. For instance, EF is independent of F U G, since 


P[E(FUG)] = P(EFU EG) 
= P(EF) + P(EG) — P(EFG) 
= P(E)P(F) + P(E)P(G) — P(E)P(FG) 
= P(E)[P(F) + P(G) — P(FG)] 
= P(E)P(FUG) 


Of course, we may also extend the definition of independence to more than three 
events. The events EF, E>, ...,E, are said to be independent if for every subset 
Ey,, EF 2,,-E;y,,7r <n of these events, 


P(E, E2,° Ey) = P(Ey,)P(E2,)--P(E,) 


Finally, we define an infinite set of events to be independent if every finite subset of 
those events is independent. 


Sometimes, a probability experiment under consideration consists of performing a 
sequence of subexperiments. For instance, if the experiment consists of continually 
tossing a coin, we may think of each toss as being a subexperiment. In many cases, 
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it is reasonable to assume that the outcomes of any group of the subexperiments 
have no effect on the probabilities of the outcomes of the other subexperiments. If 
such is the case, we say that the subexperiments are independent. More formally, 
we say that the subexperiments are independent if E,, E>, ..., En, ... is necessarily an 
independent sequence of events whenever E; is an event whose occurrence is 
completely determined by the outcome of the ith subexperiment. 


If each subexperiment has the same set of possible outcomes, then the 
subexperiments are often called trials. 


Example 4f 


An infinite sequence of independent trials is to be performed. Each trial results in 
a success with probability p and a failure with probability 1 — p. What is the 
probability that 


a. at least 1 success occurs in the first n trials; 
b. exactly k successes occur in the first n trials; 
c. all trials result in successes? 


Solution 


In order to determine the probability of at least 1 success in the first n trials, it is 
easiest to compute first the probability of the complementary event: that of no 
successes in the first n trials. If we let E; denote the event of a failure on the ith 
trial, then the probability of no successes is, by independence, 


P(E,E2°En) = P(E,)P(E2)--P(En) = (1—p)” 


Hence, the answer to part (a) is 1 — (1 — p)”. 


To compute the answer to part (b), consider any particular sequence of the first n 
outcomes containing k successes and n — k failures. Each one of these 
sequences will, by the assumed independence of trials, occur with probability 


_ n 
p*(1—p)" Since there are (;) such sequences [there are n!/k!(n — k)! 
permutations of k successes and n — k failures], the desired probability in part (b) 
is 


n = 
P{exactly k successes} = (j)e*a ap)" * 


To answer part (c), we note that, by part (a), the probability of the first n trials all 
resulting in success Is given by 


P(E{E3--En) = p” 


Thus, using the continuity property of probabilities (Section 2.6 _), we see that 
the desired probability is given by 


P( n Ef) 
i=1 


| 
- 
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5 
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Example 4g 


A system composed of n separate components is said to be a parallel system if it 
functions when at least one of the components functions. (See Figure 3.2 _ .) 
For such a system, if component i, which is independent of the other 
components, functions with probability p,, i = 1, ...,n, what is the probability that 
the system functions? 


Figure 3.2 Parallel system: functions if current flows from A to B. 


Z 
A B 
2 
n 
Solution 
Let A; denote the event that component i functions. Then, 
P{system functions} = 1-— P{system does not function} 


1 — P{all components do not function} 


1-P( nf) 
U 
i 


1- | | (4 = D,) by independence 


i=1 
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Example 4h 


Independent trials consisting of rolling a pair of fair dice are performed. What is 
the probability that an outcome of 5 appears before an outcome of 7 when the 
outcome of a roll is the sum of the dice? 


Solution 


If we let E,, denote the event that no 5 or 7 appears on the first n — 1 trials anda 
5 appears on the nth trial, then the desired probability is 


P( U E,] = oy P(E 
n=1 
n=1 
4 6 
Now, since P{5 on any trial} = aE and P{7 onany trial} = 36° "e obtain, by the 


independence of trials, 


Thus, 


U1] Nd 


This result could also have been obtained by the use of conditional probabilities. 
If we let E be the event that a 5 occurs before a 7, then we can obtain the desired 
probability, P(E), by conditioning on the outcome of the first trial, as follows: Let F 
be the event that the first trial results in a 5, let G be the event that it results in a 
7, and let H be the event that the first trial results in neither a 5 nor a 7. Then, 
conditioning on which one of these events occurs gives 


P(E) = P(E|F)P(F) + P(E|G)P(G) + P(E|H)P(A) 


However, 
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P(E|F) = 1 
P(E|G) = 0 
P(E|H) = P(E) 


The first two equalities are obvious. The third follows because if the first outcome 
results in neither a 5 nor a 7, then at that point the situation is exactly as it was 
when the problem first started—namely, the experimenter will continually roll a 
pair of fair dice until either a 5 or 7 appears. Furthermore, the trials are 


independent; therefore, the outcome of the first trial will have no effect on 
' 4 6 26 . 
subsequent rolls of the dice. Since P(r) = 36° P@ = 3e° and (1) =a) it 


follows that 


P(E) = 5 + P(E) a 


or 


The reader should note that the answer is quite intuitive. That is, because a 5 
occurs on any roll with probability = and a 7 with probability = it seems 
intuitive that the odds that a 5 appears before a 7 should be 6 to 4 against. The 
probability should then be af as indeed it is. 


The same argument shows that if EF and F are mutually exclusive events of an 
experiment, then, when independent trials of the experiment are performed, the 
event E will occur before the event F with probability 


P(E) 
P(E) + P(F) 
Example 4i 


Suppose there are n types of coupons and that each new coupon collected is, 
independent of previous selections, a type i coupon with probability p,, 


n 
- p, = 1. Suppose k coupons are to be collected. If A; is the event that there 
i= 


is at least one type i coupon among those collected, then, for i # j, find 


a. P(A;) 
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b. P(A; U Aj) 
c. P(A; | A;) 


Solution 


P(Aj) 1 — P(AF) 


1 — P{no coupon is type i} 


1-(1-p,)" 


where the preceding used that each coupon is, independently, not of type i with 
probability 1 — p,. Similarly, 


P(A; U A;) 1— P(A; UA;)°) 


1 — P{no coupon is either type i or type j} 


1-(1-p,-p,) 


where the preceding used that each coupon is, independently, neither of type i 
nor type j with probability 1 — p, — Dj. 


To determine P(A; | A;), we will use the identity 
P(A; U A;) = P(A;) + P(A;) — P(A:A;) 
which, in conjunction with parts (a) and (b), yields 
P(AjAj) = 1- (1-»,) +1- (1-»,) - [p= (t= 5; -p,) | 


1-(1-,) -(1-»,) +(1-2,-2,) 


Consequently, 


P(4,Aj)_ 1-(1- py (te p)) (py p)) 


P(Aj|A;) - P(Aj) = - (1 7 s 


The next example presents a problem that occupies an honored place in the history 
of probability theory. This is the famous problem of the points. In general terms, the 
problem is this: Two players put up stakes and play some game, with the stakes to 
go to the winner of the game. An interruption requires them to stop before either has 
won and when each has some sort of a “partial score.” How should the stakes be 
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divided? 


This problem was posed to the French mathematician Blaise Pascal in 1654 by the 
Chevalier de Méré, who was a professional gambler at that time. In attacking the 
problem, Pascal introduced the important idea that the proportion of the prize 
deserved by the competitors should depend on their respective probabilities of 
winning if the game were to be continued at that point. Pascal worked out some 
special cases and, more importantly, initiated a correspondence with the famous 
Frenchman Pierre de Fermat, who had a reputation as a great mathematician. The 
resulting exchange of letters not only led to a complete solution to the problem of the 
points, but also laid the framework for the solution to many other problems 
connected with games of chance. This celebrated correspondence, considered by 
some as the birth date of probability theory, was also important in stimulating interest 
in probability among the mathematicians in Europe, for Pascal and Fermat were both 
recognized as being among the foremost mathematicians of the time. For instance, 
within a short time of their correspondence, the young Dutch mathematician 
Christiaan Huygens came to Paris to discuss these Problems and solutions, and 
interest and activity in this new field grew rapidly. 


Example 4j The problem of the points 


Independent trials resulting in a success with probability p and a failure with 
probability 1 — p are performed. What is the probability that n successes occur 
before m failures? If we think of A and B as playing a game such that A gains 1 
point when a success occurs and B gains 1 point when a failure occurs, then the 
desired probability is the probability that A would win if the game were to be 
continued in a position where A needed n and B needed m more points to win. 


Solution 


We shall present two solutions. The first comes from Pascal and the second from 
Fermat. 


Let us denote by P,, ,, the probability that n successes occur before m failures. 
By conditioning on the outcome of the first trial, we obtain 


Pace = PP nin t =P Pam aine m2 1 


(Why? Reason it out.) Using the obvious boundary conditions P,, 9 = 0,Pom = 1, 
we can solve these equations for P,,,. Rather than go through the tedious 
details, let us instead consider Fermat’s solution. 


Fermat argued that in order for n successes to occur before m failures, it is 
necessary and sufficient that there be at least n successes in the first m +n — 1 
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trials. (Even if the game were to end before a total of m + n — 1 trials were 
completed, we could still imagine that the necessary additional trials were 
performed.) This is true, for if there are at least n successes in the first m +n-—1 
trials, there could be at most m — 1 failures in those m + n — 1 trials; thus, n 
successes would occur before m failures. If, however, there were fewer than n 
successes in the first m + n — 1 trials, there would have to be at least m failures 
in that same number of trials; thus, n Successes would not occur before m 
failures. 


Hence, since, as shown in Example 4f __, the probability of exactly k successes 
a | m+n-1-k 

inm+n-— 1 trials is ( k Jor(2 — ») , it follows that the desired 

probability of n successes before m failures is 


mt+tn-1 


m+n-1 . 
Pam= ». a pl lp 


k=n 


The following example gives another instance where determining the probability that 
a player wins a match is made easier by assuming that the play continues even after 
the match winner has been determined. 


Example 4k Service protocol in a serve and rally game 


Consider a serve and rally match (such as volleyball, badminton, or squash) 
between two players, A and B. The match consists of a sequence of rallies, with 
each rally beginning with a serve by one of the players and continuing until one 
of the players has won the rally. The winner of the rally receives a point, and the 
match ends when one of the players has won a total of n points, with that player 
being declared the winner of the match. Suppose whenever a rally begins with A 
as the server, that A wins that rally with probability p , and that B wins it with 
probability q, = 1—p,, and that a rally that begins with B as the server is won by 
A with probability p, and by B with probability q¢, = 1—p,. Player A is to be the 
initial server. There are two possible server protocols that are under 
consideration: “winner serves,” which means that the winner of a rally is the 
server for the next rally, or “alternating serve,” which means that the server 
alternates from rally to rally, so that no two consecutive rallies have the same 
server. Thus, for instance, ifn = 3, then the successive servers under the “winner 
serves” protocol would be A, A, B, A, A if A wins the first point, then B the next, 
then A wins the next two. On the other hand, the sequence of servers under the 
“alternating serve” protocol will always be A,B, A,B,A,... until the match winner 
is decided. If you were player A, which protocol would you prefer? 
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Solution 


Surprisingly, it turns out that it makes no difference, in that the probability that A 
is the match winner is the same under either protocol. To show that this is the 
case, it is advantageous to suppose that the players continue to play until a total 
of 2n — 1 rallies have been completed. The first player to win n rallies would then 
be the one who has won at least n of the 2n — 1 rallies. To begin, note that if the 
alternating serve protocol is being used, then player A will serve exactly n times 
and player B will serve exactly n — 1 times in the 2n — 1 rallies. 


Now consider the winner serve protocol, again assuming that the players 
continue to play until 2n — 1 rallies have been completed. Because it makes no 
difference who serves the “extra rallies” after the match winner has been 

decided, suppose that at the point at which the match has been decided 
(because one of the players has won n points), the remainder (if there are any) of 
the 2n — 1 rallies are all served by the player who lost the match. Note that this 
modified service protocol does not change the fact that the winner of the match 
will still be the player who wins at least n of the 2n — 1 rallies. We claim that 
under this modified service protocol, A will always serve n times and B will 
always serve n — 1 times. Two cases show this. 


Case 1: A wins the match. 


Because A serves first, it follows that A’s second serve will immediately follow A’s 
first point; A’s third serve will immediately follow A’s second point; and, in 
particular, A’s nth serve will immediately follow A’s (n — 1) point. But this will be 
the last serve of A before the match result is decided. This is so because either A 
will win the point on that serve and so have n points, or A will lose the point and 
so the serve will switch to, and remain with, B until A wins point number n. Thus, 
provided that A wins the match, it follows that A would have served a total of n 
times at the moment the match is decided. Because, by the modified service 
protocol, A will never again serve, it follows in this case that A serves exactly n 
times. 


Case 2: B wins the match. 


Because A serves first, B’s first serve will come immediately after B’s first point; B 
’s second serve will come immediately after B’s second point; and, in particular, B 
’s (n— 1) serve will come immediately after B’s (n — 1) point. But that will be the 
last serve of B before the match is decided because either B will win the point on 
that serve and so have n points, or B will lose the point and so the serve will 
switch to, and remain with, A until B wins point number n. Thus, provided that B 
wins the match, we see that B would have served a total of n — 1 times at the 
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moment the match is decided. Because, by the modified service protocol, B will 
never again serve, it follows in this case that B serves exactly n — 1 times, and, 
as there are a total of 2n — 1 rallies, that A serves exactly n times. 


Thus, we see that under either protocol, A will always serve n times and B will 
serve n — 1 times and the winner of the match will be the one who wins at least n 
points. But since A wins each rally that he serves with probability p, and wins 
each rally that B serves with probability p, it follows that the probability that A is 
the match winner is, under either protocol, equal to the probability that there are 
at least n successes in 2n — 1 independent trials, when n of these trials result in 
a success with probability p, and the other n — 1 trials result in a success with 
probability p,,. Consequently, the win probabilities for both protocols are the 


same. 


Our next two examples deal with gambling problems, with the first having a 
surprisingly elegant analysis.* 


*The remainder of this section should be considered optional. 


Example 41 


Suppose that initially there are r players, with player i having n, units, 

n; > 0,i = 1,...,r. At each stage, two of the players are chosen to play a game, 
with the winner of the game receiving 1 unit from the loser. Any player whose 
fortune drops to 0 is eliminated, and this continues until a single player has all 


7 
n= ». n,; units, with that player designated as the victor. Assuming that the 
i=1. 


results of successive games are independent and that each game is equally 
likely to be won by either of its two players, find P;, the probability that player i is 
the victor. 


Solution 


To begin, suppose that there are n players, with each player initially having 1 unit. 
Consider player i. Each stage she plays will be equally likely to result in her 
either winning or losing 1 unit, with the results from each stage being 
independent. 


In addition, she will continue to play stages until her fortune becomes either 0 or 
n. Because this is the same for all n players, it follows that each player has the 
same chance of being the victor, implying that each player has probability 1/n of 
being the victor. Now, suppose these n players are divided into r teams, with 
team i containing n; players, i = 1,...,.7. Then, the probability that the victor is a 
member of team iis n;/n. But because 
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a. team i initially has a total fortune of n; units, i = 1,...,r, and 

b. each game played by members of different teams is equally likely to be 
won by either player and results in the fortune of members of the winning 
team increasing by 1 and the fortune of the members of the losing team 
decreasing by 7, 


it is easy to see that the probability that the victor is from team i is exactly the 
probability we desire. Thus, P; = n,;/n. Interestingly, our argument shows that 
this result does not depend on how the players in each stage are chosen. 


In the gambler’s ruin problem, there are only 2 gamblers, but they are not assumed 
to be of equal skill. 


Example 4m The gambler’s ruin problem 


Two gamblers, A and B, bet on the outcomes of successive flips of a coin. On 
each flip, if the coin comes up heads, A collects 1 unit from B, whereas if it comes 
up tails, A pays 1 unit to B. They continue to do this until one of them runs out of 
money. If it is assumed that the successive flips of the coin are independent and 
each flip results in a head with probability p, what is the probability that A ends up 
with all the money if he starts with i units and B starts with N — i units? 


Solution 


Let E denote the event that A ends up with all the money when he starts with i 
and B starts with N — i, and to make clear the dependence on the initial fortune of 
A, let P; = P(E). We shall obtain an expression for P(E) by conditioning on the 
outcome of the first flip as follows: Let H denote the event that the first flip lands 
on heads; then 


P; = P(E) P(E|H)P(H) + P(E|H‘°)P(A‘) 


pP(E|H) + (1—p)P(E|H*) 


Now, given that the first flip lands on heads, the situation after the first bet is that 
A has i+ 1 units and B has N — (i + 1). Since the successive flips are assumed 
to be independent with a common probability p of heads, it follows that from that 
point on, A’s probability of winning all the money is exactly the same as if the 
game were just starting with A having an initial fortune of i + 1 and B having an 
initial fortune of N — (i+ 1). Therefore, 


P(E|H) = Pi+1 


and similarly, 


P(E|H*) = Pi-4 


Hence, letting q = 1 — p, we obtain 


(4.2) 
P; = pPia1 + qPi-1i = 1,2,...N —1 


By making use of the obvious boundary conditions P, = 0 and Py, = 1, we shall 
now solve Equation (4.2) . Since p + q = 1, these equations are equivalent to 


pP; + qP; = pPi+1 + qPi-1 


Pi414 —-Pi= 4p, -Pi-1)} = 1,2,..,N-—1 


Hence, since P, = 0, we obtain, from Equation (4.3) _, 
(4.4) 


Mga 24 


H 
| 
, ae 
zo) 
R 
| 
v 
° 
we 
ll 
| 
6”) 
mn 


i-1 
q q 
ffi = “(r.-1-P-a)=(2) Py 


N-1 
q q 
Py—Py-1 = {(Py-1-Pr-2)=(2) Py 


Adding the first i — 1 equations of (4.4 _ ) yields 


BG) =) 


P,—P, =P, 
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or 


1—(q/p)' q 
SaaS se if -#1 
1- 1 
P, (q/P) 
iP, if 4 =1 
p 


Using the fact that P, = 1, we obtain 


ies 1 
(a/p) itp +5 
P.= L<(¢/p) 
ne 
1 2 1 
N- a see 
Hence, 
(4.5) 
= ; 1 
(a/p) itp #5 
p. -)1-(@/p) 
U 
i ss 1 
N erg 


Let Q, denote the probability that B winds up with all the money when A starts 
with i and B starts with N — i. Then, by symmetry to the situation described, and 
on replacing p by q and i by N — i, it follows that 


1-(p/q)" 
N 
Q,= 1 — (p/q) 
N-i 
N 


ifq # 


1 
2 
1 
2 


ifq = 


1 1 1 
Moreover, since q = 5 is equivalent to p = 3° we have, when q # 3° 


1-(q/p)' 1-(p/q)*' 
1—(q/p)” 1-(p/q)" 
p" —p%(q/p)' : q’ —q™(p/q)" * 


P;+Q, = 


L 


pN — q qh —pN 
pN — pN-igi— gN 4 gipN-i 
= pN — qN 
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1 
This result also holds when p = q = 3° so 
P,{+Q,;=1 


In words, this equation states that with probability 1, either A or B will wind up 
with all of the money; in other words, the probability that the game continues 
indefinitely with A’s fortune always being between 1 and N — 1 is zero. (The 
reader must be careful because, a priori, there are three possible outcomes of 
this gambling game, not two: Either A wins, or B wins, or the game goes on 
forever with nobody winning. We have just shown that this last event has 
probability 0.) 


As a numerical illustration of the preceding result, if A were to start with 5 units 


1 1 
and B with 10, then the probability of A’s winning would be 3 if p were 5° 
whereas it would jump to 


if p were .6. 


A special case of the gambler’s ruin problem, which is also known as the 
problem of duration of play, was proposed to Huygens by Fermat in 1657. The 
version Huygens proposed, which he himself solved, was that A and B have 12 
coins each. They play for these coins in a game with 3 dice as follows: Whenever 
11 is thrown (by either—it makes no difference who rolls the dice), A gives a coin 
to B. Whenever 14 is thrown, B gives a coin to A. The person who first wins all 


A? 15 
the coins wins the game. Since P{roll 11} = ae and Pf{roll 14} = 716 We see 


from Example 4h __ that, for A, this is just the gambler’s ruin problem with 


15 
p= a! = 12, and N = 24. The general form of the gambler’s ruin problem was 


solved by the mathematician James Bernoulli and published 8 years after his 
death in 1713. 


For an Application of the gambler’s ruin problem to drug testing, suppose that 
two new drugs have been developed for treating a certain disease. Drug i has a 
cure rate p,,i = 1,2, in the sense that each patient treated with drug i will be 
cured with probability p,. These cure rates are, however, not known, and we are 
interested in finding a method for deciding whether p, > p, or p, > p,. To decide 
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on one of these alternatives, consider the following test: Pairs of patients are to 
be treated sequentially, with one member of the pair receiving drug 1 and the 
other drug 2. The results for each pair are determined, and the testing stops 
when the cumulative number of cures from one of the drugs exceeds the 
cumulative number of cures from the other by some fixed, predetermined 
number. More formally, let 


to if the patient in the jth pair that receives drug 1 is cured 
j= 


0 otherwise 


to if the patient in the jth pair that receives drug 2 is cured 
j — 


0 otherwise 


For a predetermined positive integer M, the test stops after pair N, where N is the 
first value of n such that either 


Xp ter tX,—-Y¥, +--+ Y,) = M 


or 


X,+-+X,—(¥at-+Y¥,) = —M 
In the former case, we assert that p, > p, and in the latter that p, > p,. 


In order to help ascertain whether the foregoing is a good test, one thing we 
would like to know is the probability that it leads to an incorrect decision. That is, 
for given p, and p,, where p, > p,, what is the probability that the test will 
incorrectly assert that p, > p,? To determine this probability, note that after each 
pair is checked, the cumulative difference of cures using drug 1 versus drug 2 will 
go up by 1 with probability p, (4 — p. )—since this is the probability that drug 1 
leads to a cure and drug 2 does not—or go down by 1 with probability (1 — p, )p,, 
or remain the same with probability p,p, + (1 —p,)(1—p,). Hence, if we 
consider only those pairs in which the cumulative difference changes, then the 
difference will go up by 1 with probability 


P{up 1 | up 1 ordown 1} 


p,(1 = Dy) 
p1=p5) + (l= pip, 


p 


and down by 1 with probability 


p(1 = P,) 


1-p i 
p,(1 — Py) + (1 — P1)P, 
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Thus, the probability that the test will assert that p, > p, is equal to the 
probability that a gambler who wins each (one-unit) bet with probability p will go 
down M before going up M. But Equation (4.5) —_, with i = M,N = 2M, shows 
that this probability is given by 


p= P{test asserts that p, > p,} 
M 
1- 
i ( p *) 
1— 1—p 2M 
(S) 
Pp 


where 


p_ _ P,(1~P,) 
1—p p,(1—p,) 


For instance, ifp, = .6 and p, = .4, then the probability of an incorrect decision 
is .017 when M = 5 and reduces to .0003 when M = 10. 


Example 4n 


A total of 64 teams are selected to play in the end of season NCAA college 
basketball tournament. These 64 are divided into four groups, called brackets, of 
size 16 each, with the teams in each bracket being given seedings ranging from 
1 (the top rated team in the bracket) to 16 (the lowest rated team in the bracket). 
The teams in each bracket play each other in a knockout style tournament, 
meaning a loss knocks a team out of the tournament. Naming a team by its 
seeding, the schedule of games to be played by the teams in a bracket is as 
given by the following graph: 


Figure 3.3 Ncaa tournament bracket format 
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Thus, for instance, teams 1 and 16 play a game in round one, as do teams 8 and 
9, with the winners of these games then playing each other in round two. Let 

r(i, j) = rj, i), i# j denote the round in which i and j would play if both teams 
win up to that point. That is, r(i, 7) = k if i and j would play in round k if each won 


its first k — 1 games. For instance, r(1,16) = 1, r(1,8) = 2, r(1,5) = 3, 
r(1,6) = 4. 


Let us focus on a single one of the brackets, and let us suppose that, no matter 
what has previously occurred, if i and j ever play each other then i will win with 
probability Pi, = 1—- Dj i: Let P; be the probability that team i is the winner of the 
bracket, i = 1,...,16. Because P; is the probability that i wins 4 games, we will 
compute the values P,,..., P34, by determining the quantities P;(k), i = 1, ..., 16, 
where P;(k) is defined to be the probability that i wins its first k games. The 
probabilities P;(k) will be determined recursively, first for k = 1, then for k = 2, 
then for k = 3, and finally for k = 4 which will yield P; = P;(4). 


Let 0;(k) = {j:(r(i, j) = k} be the set of possible opponents of i in round k. To 
compute P;(k), we will condition on which of the teams in 0;(k) reaches round k. 
Because a team will reach round k if that team wins its first k — 1 games, this 
gives 


(4.6) 


Pik |= ». P| iwinits first k games| j reaches round k Pl ie= 1. 


j € 0;(k) 


Now, because any team that plays a team in O;(k) in any of rounds 1,...,k — 1is 
a possible opponent of team i in round K, it follows that all games in rounds 

1, ...,k — 1 involving a team in 0;(k) will be against another team in that set, and 
thus the results of these games do not affect which teams i would play in its first 
k — 1 games. Consequently, whether team i reaches round k is independent of 

which team in 0;(k) reaches round k. Hence, for j € 0;(k) 


P(iwin its first k games| j reaches round k) 
= P(iwinits first k — 1 games, i beats j|j reaches round k) 


= P(iwinits first k — 1 games)P(i beats j|iand j reach round k) 


- Pi(k = 1) Pi; 


where the next to last equality follows because whether i wins its first k — 1 
games is independent of the event that j wins its first k — 1 games. Hence, from 
(4.6) and the preceding equation, we have that 


148 of 848 


149 of 848 


(4.7) 


P;(k) 


PA k= 1)\0.5P A kat 


ij J 
je 0,(k) 


j € 0;(k) 


Starting with P;(0) = 1, the preceding enables us to determine P;(1) for all i, 
which then enables us to determine P;(2) for all i, and so on, up to P; = P;(4). 


To indicate how to apply the recursive equations (4.7) __, suppose that 


Pig at Thus, for instance, the probability that team 2 (the second seed) 
beats team 7 (the seventh seed) is p, . = 7/9. To compute, P; = P;(4), the 
probability that i wins the bracket, start with the quantities P;(1), i = 1,...,16, 


equal to the probabilities that each team wins its first game. 


P,(1) = 16/17 =1-P,.(1 


Pit6 
AO =p. ii 1 — P,5(1 


P3(1) = Pay = 14/17 =1-Py(1 


P,(1) = p,4, = 13/17 =1-P,,(1) 
P5(1) = py, = 12/17 =1-Py,(1) 
PAQ) = Poy, = 11/17 =1-Pya(1) 
P,(1) = Pg = 10/17 = 1 — Pyo(1) 


P,(1) 


Pag = 9/17 =1-P,(1) 


The quantities P;(2) are then obtained by using the preceding along with the 
recursion (4.7 __). For instance, because the set of possible opponents of team 1 
in round 2 is 0,(2) = {8,9}, we have that 


p,(2) = P,(1)(Po(1)Pa. 4 P,(1)p, 4) = 2 (a5 + = a) ~ .8415 


The other quantities P;(2), ....P,.(2) are obtained similarly, and are used to obtain 


the quantities P;(3), i = 1,...,16, which are then used to obtain P; = P;(4), 
i= 1,...,16. 


Suppose that we are presented with a set of elements and we want to determine 
whether at least one member of the set has a certain property. We can attack this 
question probabilistically by randomly choosing an element of the set in such a way 
that each element has a positive probability of being selected. Then the original 
question can be answered by a consideration of the probability that the randomly 
selected element does not have the property of interest. If this probability is equal to 
1, then none of the elements of the set has the property; if it is less than 1, then at 
least one element of the set has the property. 


The final example of this section illustrates this technique. 


Example 40 

The complete graph having n vertices is defined to be a set of n points (called 
n 

vertices) in the plane and the (7) lines (called edges) connecting each pair of 


vertices. The complete graph having 3 vertices is shown in Figure 3.4 
Suppose now that each edge in a complete graph having n vertices is to be 
colored either red or blue. For a fixed integer k, a question of interest is, Is there 


k 
a way of coloring the edges so that no set of k vertices has all of its ()) 


connecting edges the same color? It can be shown by a probabilistic argument 
that if n is not too large, then the answer is yes. 


Figure 3.4: 


The argument runs as follows: Suppose that each edge is, independently, equally 


1 
likely to be colored either red or blue. That is, each edge is red with probability 2 


n n 
Number the (7) sets of k vertices and define the events E;,i = 1,..., (7) as 


follows: 


E; = {all of the connecting edges of the ith set of k vertices are the same color} 
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k 
Now, since each of the ()) connecting edges of a set of k vertices is equally 


likely to be either red or blue, it follows that the probability that they are all the 
same color is 


‘ian 


P| E; =2(5 


Therefore, because 
P( U Ey) < > PED Boole’ s inequality 
U 
i 


we find that P( U Ei), the probability that there is a set of k vertices all of whose 
U 


connecting edges are similarly colored, satisfies 


(ve)s("\(3) 


Hence, if 


vee 
= <1 
k}\2 


or, equivalently, if 


ae < 2k(k-1)/2-1 
k 


n 
then the probability that at least one of the (7) sets of k vertices has all of its 


connecting edges the same color is less than 1. Consequently, under the 
preceding condition on n and k, it follows that there is a positive probability that 
no set of k vertices has all of its connecting edges the same color. But this 
conclusion implies that there is at least one way of coloring the edges for which 
no set of k vertices has all of its connecting edges the same color. 


Remarks 


a. Whereas the preceding argument established a condition on n and k that 
guarantees the existence of a coloring scheme satisfying the desired property, 
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proved by Proposition 5.1 
a probability. 


it gives no information about how to obtain such a scheme (although one 
possibility would be simply to choose the colors at random, check to see if the 
resulting coloring satisfies the property, and repeat the procedure until it 
does). 

b. The method of introducing probability into a problem whose statement is 
purely deterministic has been called the probabilistic method.t Other 
examples of this method are given in Theoretical Exercise 24 and 
Examples 2t and 2u of Chapter 7 
tSee N. Alon, J. Spencer, and P. Erdos, The Probabilistic Method (New York: John 
Wiley & Sons, Inc., 1992). 


3.5 P(|F) Is a Probability 


Conditional probabilities satisfy all of the properties of ordinary probabilities, as is 
, which shows that P(E| F) satisfies the three axioms of 


Proposition 5.1 
a.0 < P(E|F) < 1. 
b. P(S|F) = 1. 
c. If E;,i = 1,2, ..., are mutually exclusive events, then 


Proof. To prove part (a), we must show that 0 < P(EF)/P(F) < 1. The left-side 
inequality is obvious, whereas the right side follows because EF c F, which 
implies that P(EF) < P(F). Part (b) follows because 


P(SF)  P(F) _ 


PIE) = BR) = BR) 


Part (c) follows from 


I 
1 
mI 
a 
ie 


where the next-to-last equality follows because E;E; = @ implies that 


If we define Q(E) = P(E|F), then, from Proposition 5.1, Q(E) may be regarded 
as a probability function on the events of S. Hence, all of the propositions previously 
proved for probabilities apply to Q(£). For instance, we have 


Q(E, U Ez) = Q(E1) + Q(E2) — Q(F1E2) 
or, equivalently, 


P(E, UE2|F) = P(E, |F) + P(E, |F) — P(E, E2|F) 


Also, if we define the conditional probability Q(E, | E,) by 
Q(E, | E2) = Q(E,E2)/Q(E2), then, from Equation (3.1) , we have 


(5.1) 
Q(E1) = Q(E|E2)Q(E2) + Q(EL|E2) Q(E2) 


Since 
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Q(E,E2) 
Q(E2) 
P(E,E2|F) 
P(E2|F) 
P(E, EF) 
P(F) 
P(E2F) 
P(F) 
= P(E,|E2F) 


Q(E,|E2) 


Equation (5.1) is equivalent to 


P(E,|F) = P(E,|E2F)P(E2|F) + P(E, |E5F)P(E% | F) 


Example 5a 


Consider Example 3a __, which is concerned with an insurance company that 
believes that people can be divided into two distinct classes: those who are 
accident prone and those who are not. During any given year, an accident-prone 
person will have an accident with probability .4, whereas the corresponding figure 
for a person who is not prone to accidents is .2. What is the conditional 
probability that a new policyholder will have an accident in his or her second year 
of policy ownership, given that the policyholder has had an accident in the first 
year? 


Solution 


If we let A be the event that the policyholder is accident prone and we let 

A;,i = 1,2, be the event that he or she has had an accident in the ith year, then 
the desired probability P(A, |A,) may be obtained by conditioning on whether or 
not the policyholder is accident prone, as follows: 


P(A, | Az) a P(A, | AA,)P(A|A,) - P(A, | AA, )P(A*|A,) 


Now, 


P(A,A) _ P(A,|A)P(A) 


P(AlA) = “Bay = PCy) 


3 
However, P(A) is assumed to equal 70° and it was shown in Example 3a __ that 


P(A,) = .26. Hence, 
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(4)(3) _ 6 


.26 13 


P(A|A,) = 


Thus, 


7 
P(A‘|A,) =1 — P(A|A;) = 13 


Since P(A,|AA,) = P(A,|A) = .4 and P(A, |A‘°A,) = P(A, |A‘) = .2, it follows 
that 


6 7 
= (.4) — .2)— = .29 
P(42/4y) = (4) 9 + (2G 


Example 5b 


A female chimp has given birth. It is not certain, however, which of two male 
chimps is the father. Before any genetic analysis has been performed, it is 
believed that the probability that male number 1 is the father is p and the 
probability that male number 2 is the father is 1 — p. DNA obtained from the 
mother, male number 1, and male number 2 indicates that on one specific 
location of the genome, the mother has the gene pair (A, A), male number 1 has 
the gene pair (a,a), and male number 2 has the gene pair (A, a). If a DNA test 
shows that the baby chimp has the gene pair (A, a), what is the probability that 
male number 1 is the father? 


Solution 


Let all probabilities be conditional on the event that the mother has the gene pair 
(A, A), male number 1 has the gene pair (a, a), and male number 2 has the gene 
pair (A, a). Now, let M; be the event that male number i, i = 1, 2, is the father, and 
let B, , be the event that the baby chimp has the gene pair (A, a). Then, 
P(M,|B,,q) is obtained as follows: 


P(M,Baa) 
P(Baa) 


P(M,|Baa) 


P(Baa|M1)P(M;) 
P(Baa|M1)P(M,) + P(Baa|M2)P(M2) 


1-p 
1+p+(1/2)0 =p) 
2p 
See 
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2 
Because 1 = > p when p < 1, the information that the baby’s gene pair is (A, a) 


increases the probability that male number 1 is the father. This result is intuitive 
because it is more likely that the baby would have gene pair (A, a) if M, is true 
than if M, is true (the respective conditional probabilities being 1 and 1/2). 


The next example deals with a problem in the theory of runs. 


Example 5c 


Independent trials, each resulting in a success with probability p or a failure with 
probability gq = 1 — p, are performed. We are interested in computing the 
probability that a run of n consecutive successes occurs before a run of m 
consecutive failures. 


Solution 


Let E be the event that a run of n consecutive successes occurs before a run of 
m consecutive failures. To obtain P(E), we start by conditioning on the outcome 
of the first trial. That is, letting H denote the event that the first trial results in a 
success, we obtain 


(5.2) 
P(E) = pP(E|H) + qP(E|H*) 


Now, given that the first trial was successful, one way we can get a run of n 
successes before a run of m failures would be to have the next n — 1 trials all 
result in successes. So, let us condition on whether or not that occurs. That is, 
letting F be the event that trials 2 through n all are successes, we obtain 


(5.3) 
P(E|H) = P(E|FH)P(F|H) + P(E|F°H)P(F*| H) 


On the one hand, clearly, P(E | FH) = 1; on the other hand, if the event F°H 
occurs, then the first trial would result in a success, but there would be a failure 
some time during the next n — 1 trials. However, when this failure occurs, it would 
wipe out all of the previous successes, and the situation would be exactly as if 
we started out with a failure. Hence, 


P(E|F°H) = P(E|H*) 


Because the independence of trials implies that F and H are independent, and 
because P(F) = p” 1, it follows from Equation (5.3) __ that 


(5.4) 
P(E|H) =p" *+ (1—p""*)P(E|H*) 


We now obtain an expression for P(E | H°) in a similar manner. That is, we let G 
denote the event that trials 2 through m are all failures. Then, 


(5.5) 


P(E|H°) = P(E|GH°)P(G|H‘) + P(E|G°H®)P(G°|H°) 


Now, GH‘ is the event that the first m trials all result in failures, so P(E| GH‘) = 0. 
Also, if G°H* occurs, then the first trial is a failure, but there is at least one 
success in the next m — 1 trials. Hence, since this success wipes out all previous 
failures, we see that 


P(E|G‘H*) = P(E|H) 
Thus, because P(G‘|H‘) = P(G°) = 1— q™ 1, we obtain, from (5.5 __), 


(5.6) 
P(E|H*) = (1-q™~1)P(E|H) 


Solving Equations (5.4) and(5.6 _ ) yields 


=1 
P(E|H) = Sea 
and 
P(E|H*) = Se 
p +q —p a 
Thus, 
(5.7) 


P(E) pP(E|H) + qP(E|H*) 


pep aa) 
prt + gmt it pas 
p= Gq) 
pe + ge — pe ages 


It is interesting to note that by the symmetry of the problem, the probability of 
obtaining a run of m failures before a run of n successes would be given by 
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Equation (5.7) — with p and q interchanged and n and m interchanged. Hence, 
this probability would equal 


(5.8) 


P{run of m failures before a run of n successes} 


q™*(1— p") 
get + ye = ge tye 


Since Equations (5.7) and(5.8 =) sum to 1, it follows that, with probability 1, 
either a run of n successes or a run of m failures will eventually occur. 


As an example of Equation (5.7) |, we note that, in tossing a fair coin, the 


7 
probability that a run of 2 heads will precede a run of 3 tails is To For 2 


5 
consecutive heads before 4 consecutive tails, the probability rises to S 


In our next example, we return to the matching problem and obtain a solution by 
using conditional probabilities. 


Example 5d 


At a party, n people take off their hats. The hats are then mixed up, and each 
person randomly selects one. We say that a match occurs if a person selects his 
or her own hat. What is the probability of 


a. no matches? 
b. exactly k matches? 


Solution 


a. Let E denote the event that no matches occur, and to make explicit the 
dependence on n, write P,, = P(E). We start by conditioning on whether or 
not the first person selects his or her own hat—call these events M and 
M‘, respectively. Then, 

P,, = P(E) = P(E|M)P(M) + P(E|M‘)P(M‘) 


Clearly, P(E|M) = 0, so 
(5.9) 
—1 
P,, = P(E|M*) “— 


Now, P(E | M°) is the probability of no matches when n — 1 people select 
from a set of n — 1 hats, when one person, called the “extra” person, does 
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not have their hat in the collection, and one hat, called the “extra” hat, 
does not belong to any of the people. This can happen in either of two 
mutually exclusive ways: Either there are no matches and the extra 
person does not select the extra hat (this being the hat of the person who 
chose first) or there are no matches and the extra person does select the 
extra hat. The probability of the first of these events is just P,,_,, which is 
seen by regarding the extra hat as “belonging” to the extra person. 
Because the second event has probability [1/(n — 1)|P,-2, we have 


1 
P(E|M°) = Py-1 Fond m2 


Thus, from Equation (5.9) __, 


or, equivalently, 
(5.10) 


1 
Pp —Py-1= (Pa -P,-2) 
However, since P,, is the probability of no matches when n people select 
among their own hats, we have 


1 
Py=0 Pp=5 


So, from Equation (5.10), 


(P, — P,) 1 1 1 
ie aim a ee AT) 
. .@s=P2) 1 eae 
ES Sg gg. OE a ay a 
and, in general, 
1 1 1 (—1)" 
Sea a a 


. To obtain the probability of exactly k matches, we consider any fixed group 


of k people. The probability that they, and only they, select their own hats 
is 
: ee | al (n—k)! 


nn—-1 n—(k—1) nok a! - 


where P,,_, is the conditional probability that the other n — k people, 
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n 
selecting among their own hats, have no matches. Since there are (}) 


choices of a set of k people, the desired probability of exactly k matches is 
1. 4 (=1)°" 


An important concept in probability theory is that of the conditional independence 
of events. We say that the events E, and E, are conditionally independent given 
F if given that F occurs, the conditional probability that E, occurs is unchanged 
by information as to whether or not E, occurs. More formally, E, and FE, are said 
to be conditionally independent given F if 


(5.11) 
P(E, | E2F) P(E, | F) 


or, equivalently, 


(5.12) 
P(E,E,|F) = P(E,|F)P(E2|F) 


The notion of conditional independence can easily be extended to more than two 
events, and this extension is left as an exercise. 


The reader should note that the concept of conditional independence was 
implicitly employed in Example 5a ___, where it was assumed that the events that 
a policyholder had an accident in his or her ith year, i = 1, 2, ..., were conditionally 
independent given whether or not the person was accident prone. The following 
example, sometimes referred to as Laplace’s rule of succession, further 
illustrates the concept of conditional independence. 


Example 5e Laplace’s rule of succession 


There are k + 1 coins in a box. When flipped, the ith coin will turn up heads with 
probability i/k,i = 0,1,...,k. A coin is randomly selected from the box and is then 
repeatedly flipped. If the first n flips all result in heads, what is the conditional 
probability that the (n + 1) flip will do likewise? 


Solution 


Letting H,, denote the event that the first n flips all land heads, the desired 
probability is 


P(Hn+1Hn) = P(Hn+1) 


P(An+11Hn) = P(H,,) = P(H,) 


To compute P(H,,), we condition on which coin is chosen. That is, letting C; 
denote the event that coin i is selected, we have that 


k 


P(H,) = >. P(Hale)P(C) 


i=0 


Now, given that coin i is selected, it is reasonable to assume that the outcomes 
will be conditionally independent, with each one resulting in a head with 
probability i/k. Hence, 


P(HAIC;) = (i/k)” 


1 
As (ci) =a this yields that 


n 


Thus, 


n+1 


yi  Ci/k) 


Pn+1|Hn) = k a 
di = 0 (t/k) 


If k is large, we can use the integral approximations 


n+2 


| 
ro a 
[M4] 
> 
MS 
~~. 
al. 3 
Ne, gt + 
3 PR 
2 2 
o-—S ooo 
bh = 
ta tad 
3 3 
Q + 
ee = 
Q 
lI R 
I 
+] eR 
i e 


So, for k large, 


n+1 


PA peli) ~ n+2 


Example 5f Updating information sequentially 
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Suppose there are n mutually exclusive and exhaustive possible hypotheses, 
with initial (sometimes referred to as prior) probabilities P(H;), pyiiee 20:09) = 1. 
Now, if information that the event E has occurred is received, then the conditional 
probability that H; is the true hypothesis (sometimes referred to as the updated or 
posterior probability of H;) is 


(5.13) 


P(H;,\E) = P(E|H;)P(Hi) 
U 

>, Peels) P(#,) 

Suppose now that we learn first that E, has occurred and then that E, has 
occurred. Then, given only the first piece of information, the conditional 
probability that H; is the true hypothesis is 


P(E, |H,)PQA) P(E1|H;)P(H;) 

P(H,|E,) = ———-—— = oS 

P(E1) >, P(Ex|#s)P (#4) 

whereas given both pieces of information, the conditional probability that H; is the 
true hypothesis is P(H; | EE), which can be computed by 


r= 
yee ard j 

One might wonder, however, when one can compute P(H;|£,E,) by using the 

right side of Equation (5.13) with E = E, and with P(H;) replaced by 

P(H;|E,), j = 1,..,n. That is, when is it legitimate to regard P(H;|E,), j = 1,as 

the prior probabilities and then use (5.13 _) to compute the posterior 

probabilities? 


Solution 


The answer is that the preceding is legitimate, provided that for each j = 1,...,n, 
the events E, and E, are conditionally independent, given H,. For if this is the 
case, then 


P(E,E,|H;) = P(E2|H;)P(Ei:|H;)) j=leun 


Therefore, 


P(E. | Hi) P(E; | Hi) P(A) 
P(E,E2) 
P(E, | Hi)P(E,Hi) 
P(E,E,) 
P(E2|Hi)P(Hi| E1)P (Es) 
P(E, E2) 
P(E2|Hi)P(Hi| Ex) 
Q(1, 2) 


P(H;|E,E2) 


_ P(E1E2) oq, aie h 
where Q| 1,2} = PE) Since the preceding equation is valid for all i, we 
1 


obtain, upon summing, 


1= > P(Hi|E1E2) = > PeaIH) PE) 


showing that 


Q(1,2)= > P(E2|Hi)PCHiEs) 


i=1 
and yielding the result 


P (Wily) = Peal HP lB) 
Yi = 1 P(E2/Hi)PCHi | Es) 

For instance, suppose that one of two coins is chosen to be flipped. Let H; be the 
event that coin i, i = 1,2, is chosen, and suppose that when coin i is flipped, it 
lands on heads with probability p,,i = 1,2. Then, the preceding equations show 
that to sequentially update the probability that coin 1 is the one being flipped, 
given the results of the previous flips, all that must be saved after each new flip is 
the conditional probability that coin 1 is the coin being used. That is, it is not 
necessary to keep track of all earlier results. 


Summary 


For events E and F, the conditional probability of E given that F has occurred is 
denoted by P(E|F) and is defined by 
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P(EF) 


P(E|F) = Pr 


The identity 


P(E, E2-En) = P(E,)P(E2 | E1)-P(En|E1°-En-1) 


is known as the multiplication rule of probability. 
A valuable identity is 
P(E) = P(E|F)P(F) + P(E|F)P(F‘) 
which can be used to compute P(E) by “conditioning” on whether F occurs. 


P(H)/P(H‘) is called the odds of the event H. The identity 


PHH|E) — PH) PEA) 
P(H°|E)  P(H)P(E|H*) 


shows that when new evidence E is obtained, the value of the odds of H becomes its 
old value multiplied by the ratio of the conditional probability of the new evidence 
when H is true to the conditional probability when H is not true. 


Let F;, i = 1,...,n, be mutually exclusive events whose union is the entire sample 
space. The identity 


P(E|F;)P(F;) 


>, PELFP(Fi) 


ae 


P(FjlE) = 


is known as Bayes’s formula. If the events F;, i = 1,...,n, are competing hypotheses, 
then Bayes’s formula shows how to compute the conditional probabilities of these 
hypotheses when additional evidence E becomes available. 


The denominator of Bayes’s formula uses that 


P(E)= > P(EIF)P(F) 


which is called the law of total probability. 


If PEF) = P(E)P(F), then we say that the events E and F are independent. This 
condition is equivalent to P(E|F) = P(E) and to P(F|E£) = P(F). Thus, the events E 
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and F are independent if knowledge of the occurrence of one of them does not affect 
the probability of the other. 


The events E,,...,£, are said to be independent if, for any subset E;,, ..., E;, of them, 


iy? 
P(E Ei) = P(Ex.)-PUE:,) 


For a fixed event F, P(E | F) can be considered to be a probability function on the 
events E of the sample space. 


Problems 


3.1. Two fair dice are rolled. What is the conditional probability that at least 
one lands on 6 given that the dice land on different numbers? 

3.2. If two fair dice are rolled, what is the conditional probability that the first 
one lands on 6 given that the sum of the dice is i? Compute for all values of i 
between 2 and 12. 

3.3. Use Equation (2.1) | to compute in a hand of bridge the conditional 
probability that East has 3 spades given that North and South have a 
combined total of 8 spades. 

3.4. What is the probability that at least one of a pair of fair dice lands on 6, 
given that the sum of the dice is i, i = 2,3,...,12? 

3.5. An urn contains 6 white and 9 black balls. If 4 balls are to be randomly 
selected without replacement, what is the probability that the first 2 selected 
are white and the last 2 black? 

3.6. Consider an urn containing 12 balls, of which 8 are white. A sample of 
size 4 is to be drawn with replacement (without replacement). What is the 
conditional probability (in each case) that the first and third balls drawn will be 
white given that the sample drawn contains exactly 3 white balls? 

3.7. The king comes from a family of 2 children. What is the probability that 
the other child is his sister? 

3.8. A couple has 2 children. What is the probability that both are girls if the 
older of the two is a girl? 

3.9. Consider 3 urns. Urn A contains 2 white and 4 red balls, urn B contains 8 
white and 4 red balls, and urn C contains 1 white and 3 red balls. If 1 ball is 
selected from each urn, what is the probability that the ball chosen from urn A 
was white given that exactly 2 white balls were selected? 

3.10. Three cards are randomly selected, without replacement, from an 
ordinary deck of 52 playing cards. Compute the conditional probability that the 
first card selected is a spade given that the second and third cards are 
spades. 


3.11. Two cards are randomly chosen without replacement from an ordinary 
deck of 52 cards. Let B be the event that both cards are aces, let A, be the 
event that the ace of spades is chosen, and let A be the event that at least 
one ace is chosen. Find 

a. P(B|A,) 

b. P(B|A) 


3.12. Suppose distinct values are written on each of 3 cards, which are then 
randomly given the designations A, B, and C. Given that card A's value is less 
than card B’s value, find the probability it is also less than card C’s value. 
3.13. A recent college graduate is planning to take the first three actuarial 
examinations in the coming summer. She will take the first actuarial exam in 
June. If she passes that exam, then she will take the second exam in July, 
and if she also passes that one, then she will take the third exam in 
September. If she fails an exam, then she is not allowed to take any others. 
The probability that she passes the first exam is .9. If she passes the first 
exam, then the conditional probability that she passes the second one is .8, 
and if she passes both the first and the second exams, then the conditional 
probability that she passes the third exam is .7. 

a. What is the probability that she passes all three exams? 

b. Given that she did not pass all three exams, what is the conditional 

probability that she failed the second exam? 


3.14. Suppose that an ordinary deck of 52 cards (which contains 4 aces) is 
randomly divided into 4 hands of 13 cards each. We are interested in 
determining p, the probability that each hand has an ace. Let FE; be the event 
that the ith hand has exactly one ace. Determine p = P(E, EE E,) by using 
the multiplication rule. 
3.15. An urn initially contains 5 white and 7 black balls. Each time a ball is 
selected, its color is noted and it is replaced in the urn along with 2 other balls 
of the same color. Compute the probability that 

a. the first 2 balls selected are black and the next 2 are white; 

b. of the first 4 balls selected, exactly 2 are black. 


3.16. An ectopic pregnancy is twice as likely to develop when the pregnant 
woman is a smoker as it is when she is a nonsmoker. If 32 percent of women 
of childbearing age are smokers, what percentage of women having ectopic 
pregnancies are smokers? 

3.17. Ninety-eight percent of all babies survive delivery. However, 15 percent 
of all births involve Cesarean (C) sections, and when a C section is 
performed, the baby survives 96 percent of the time. If a randomly chosen 
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pregnant woman does not have a C section, what is the probability that her 
baby survives? 
3.18. In a certain community, 36 percent of the families own a dog and 22 
percent of the families that own a dog also own a cat. In addition, 30 percent 
of the families own a cat. What is 
a. the probability that a randomly selected family owns both a dog and a 
cat? 
b. the conditional probability that a randomly selected family owns a dog 
given that it owns a cat? 


3.19. A total of 46 percent of the voters in a certain city classify themselves as 
Independents, whereas 30 percent classify themselves as Liberals and 24 
percent say that they are Conservatives. In a recent local election, 35 percent 
of the Independents, 62 percent of the Liberals, and 58 percent of the 
Conservatives voted. A voter is chosen at random. Given that this person 
voted in the local election, what is the probability that he or she is 

a. an Independent? 

b. a Liberal? 

c. a Conservative? 

d. What percent of voters participated in the local election? 


3.20. A total of 48 percent of the women and 37 percent of the men who took 
a certain “quit smoking” class remained nonsmokers for at least one year after 
completing the class. These people then attended a success party at the end 
of a year. If 62 percent of the original class was male, 

a. what percentage of those attending the party were women? 

b. what percentage of the original class attended the party? 


3.21. Fifty-two percent of the students at a certain college are females. Five 
percent of the students in this college are majoring in computer science. Two 
percent of the students are women majoring in computer science. If a student 
is selected at random, find the conditional probability that 
a. the student is female given that the student is majoring in computer 
science; 
b. this student is majoring in computer science given that the student is 
female. 


3.22. A total of 500 married working couples were polled about their annual 
salaries, with the following information resulting: 


168 of 848 


Less than $125,000 More than $125,000 


Less than $125,000 212 198 


More than $125,000 36 54 


For instance, in 36 of the couples, the wife earned more and the husband 
earned less than $125,000. If one of the couples is randomly chosen, what is 
a. the probability that the husband earns less than $125,000? 
b. the conditional probability that the wife earns more than $125,000 given 
that the husband earns more than this amount? 
c. the conditional probability that the wife earns more than $125,000 given 
that the husband earns less than this amount? 


3.23. A red die, a blue die, and a yellow die (all six sided) are rolled. We are 
interested in the probability that the number appearing on the blue die is less 
than that appearing on the yellow die, which is less than that appearing on the 
red die. That is, with B, Y, and R denoting, respectively, the number appearing 
on the blue, yellow, and red die, we are interested in P(B < Y < R). 
a. What is the probability that no two of the dice land on the same 
number? 
b. Given that no two of the dice land on the same number, what is the 
conditional probability that B < Y < R? 
c. What is P(B < Y < R) 


3.24. Urn | contains 2 white and 4 red balls, whereas urn II contains 1 white 
and 1 red ball. A ball is randomly chosen from urn | and put into urn II, and a 
ball is then randomly selected from urn II. What is 
a. the probability that the ball selected from urn II is white? 
b. the conditional probability that the transferred ball was white given that 
a white ball is selected from urn II? 


3.25. Twenty percent of B’s phone calls are with her daughter. Sixty five 
percent of the time that B speaks with her daughter she hangs up the phone 
with a smile on her face. Given that B has just hung up the phone with a smile 
on her face, we are interested in the conditional probability that the phone call 
was with her daughter. Do we have enough information to determine this 
probability. If yes, what is it; if no, what additional information is needed. 

3.26. Each of 2 balls is painted either black or gold and then placed in an urn. 


1 
Suppose that each ball is colored black with probability 3 and that these 
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events are independent. 

a. Suppose that you obtain information that the gold paint has been used 
(and thus at least one of the balls is painted gold). Compute the 
conditional probability that both balls are painted gold. 

b. Suppose now that the urn tips over and 1 ball falls out. It is painted 
gold. What is the probability that both balls are gold in this case? 
Explain. 


3.27. The following method was proposed to estimate the number of people 
over the age of 50 who reside in a town of known population 100,000: “As you 
walk along the streets, keep a running count of the percentage of people you 
encounter who are over 50. Do this for a few days; then multiply the 
percentage you obtain by 100,000 to obtain the estimate.” Comment on this 
method. 
Hint: Let p denote the proportion of people in the town who are over 50. 
Furthermore, let a, denote the proportion of time that a person under the age 
of 50 spends in the streets, and let a, be the corresponding value for those 
over 50. What quantity does the method suggested estimate? When is the 
estimate approximately equal to p? 
3.28. Suppose that 5 percent of men and 0.25 percent of women are color 
blind. A color-blind person is chosen at random. What is the probability of this 
person being male? Assume that there are an equal number of males and 
females. What if the population consisted of twice as many males as females? 
3.29. All the workers at a certain company drive to work and park in the 
company’s lot. The company is interested in estimating the average number of 
workers in a car. Which of the following methods will enable the company to 
estimate this quantity? Explain your answer. 

A. Randomly choose n workers, find out how many were in the cars in 

which they were driven, and take the average of the n values. 
B. Randomly choose n cars in the lot, find out how many were driven in 
those cars, and take the average of the n values. 


3.30 Suppose that an ordinary deck of 52 cards is shuffled and the cards are 
then turned over one at a time until the first ace appears. Given that the first 
ace is the 20th card to appear, what is the conditional probability that the card 
following it is the 

a. ace of spades? 

b. two of clubs? 


3.31. There are 15 tennis balls in a box, of which 9 have not previously been 
used. Three of the balls are randomly chosen, played with, and then returned 
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to the box. Later, another 3 balls are randomly chosen from the box. Find the 
probability that none of these balls has ever been used. 
3.32. Consider two boxes, one containing 1 black and 1 white marble, the 
other 2 black and 1 white marble. A box is selected at random, and a marble 
is drawn from it at random. What is the probability that the marble is black? 
What is the probability that the first box was the one selected given that the 
marble is white? 
3.33. Ms. Aquina has just had a biopsy on a possibly cancerous tumor. Not 
wanting to spoil a weekend family event, she does not want to hear any bad 
news in the next few days. But if she tells the doctor to call only if the news is 
good, then if the doctor does not call, Ms. Aquina can conclude that the news 
is bad. So, being a student of probability, Ms. Aquina instructs the doctor to 
flip a coin. If it comes up heads, the doctor is to call if the news is good and 
not call if the news is bad. If the coin comes up tails, the doctor is not to call. 
In this way, even if the doctor doesn't call, the news is not necessarily bad. Let 
a be the probability that the tumor is cancerous; let 6 be the conditional 
probability that the tumor is cancerous given that the doctor does not call. 

a. Which should be larger, @ or B 

b. Find £ in terms of a, and prove your answer in part (a). 


3.34. A family has j children with probability Pj» where 
DP, = -1,p, = -25,p, = .35,p, = .3. A child from this family is randomly 
chosen. Given that this child is the eldest child in the family, find the 
conditional probability that the family has 

a. only 1 child; 

b. 4 children. 


Redo (a) and (b) when the randomly selected child is the youngest child of the 
family. 
3.35. On rainy days, Joe is late to work with probability .3; on nonrainy days, 
he is late with probability .1. With probability .7, it will rain tomorrow. 
a. Find the probability that Joe is early tomorrow. 
b. Given that Joe was early, what is the conditional probability that it 
rained? 


3.36. In Example 3f __, suppose that the new evidence is subject to different 
possible interpretations and in fact shows only that it is 90 percent likely that 
the criminal possesses the characteristic in question. In this case, how likely 
would it be that the suspect is guilty (assuming, as before, that he has the 
characteristic)? 

3.37. With probability .6, the present was hidden by mom; with probability .4, it 
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was hidden by dad. When mom hides the present, she hides it upstairs 70 
percent of the time and downstairs 30 percent of the time. Dad is equally likely 
to hide it upstairs or downstairs. 

a. What is the probability that the present is upstairs? 

b. Given that it is downstairs, what is the probability it was hidden by dad? 


3.38. Stores A, B, and C have 50, 75, and 100 employees, respectively, and 
50, 60, and 70 percent of them respectively are women. Resignations are 
equally likely among all employees, regardless of sex. One woman employee 
resigns. What is the probability that she works in store C? 
3.39. 
a. A gambler has a fair coin and a two-headed coin in his pocket. He 
selects one of the coins at random; when he flips it, it shows heads. 
What is the probability that it is the fair coin? 
b. Suppose that he flips the same coin a second time and, again, it shows 
heads. Now what is the probability that it is the fair coin? 
c. Suppose that he flips the same coin a third time and it shows tails. Now 
what is the probability that it is the fair coin? 


3.40. Urn A has 5 white and 7 black balls. Urn B has 3 white and 12 black 
balls. We flip a fair coin. If the outcome is heads, then a ball from urn A is 
selected, whereas if the outcome is tails, then a ball from urn B is selected. 
Suppose that a white ball is selected. What is the probability that the coin 
landed tails? 
3.41. In Example 3a ___, whatis the probability that someone has an accident 
in the second year given that he or she had no accidents in the first year? 
3.42. Consider a sample of size 3 drawn in the following manner: We start 
with an urn containing 5 white and 7 red balls. At each stage, a ball is drawn 
and its color is noted. The ball is then returned to the urn, along with an 
additional ball of the same color. Find the probability that the sample will 
contain exactly 

a. 0 white balls; 

b. 1 white ball; 

c. 3 white balls; 

d. 2 white balls. 


3.43. A deck of cards is shuffled and then divided into two halves of 26 cards 
each. A card is drawn from one of the halves; it turns out to be an ace. The 
ace is then placed in the second half-deck. The half is then shuffled, and a 
card is drawn from it. Compute the probability that this drawn card is an ace. 
Hint: Condition on whether or not the interchanged card is selected. 
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3.44. Twelve percent of all U.S. households are in California. A total of 1.3 
percent of all U.S. households earn more than $250,000 per year, while a total 
of 3.3 percent of all California households earn more than $250,000 per year. 
a. What proportion of all non-California households earn more than 
$250,000 per year? 
b. Given that a randomly chosen U.S. household earns more than 
$250,000 per year, what is the probability it is a California household? 


3.45. There are 3 coins in a box. One is a two-headed coin, another is a fair 
coin, and the third is a biased coin that comes up heads 75 percent of the 
time. When one of the 3 coins is selected at random and flipped, it shows 
heads. What is the probability that it was the two-headed coin? 

3.46. Three prisoners are informed by their jailer that one of them has been 
chosen at random to be executed and the other two are to be freed. Prisoner 
A asks the jailer to tell him privately which of his fellow prisoners will be set 
free, claiming that there would be no harm in divulging this information 
because he already knows that at least one of the two will go free. The jailer 
refuses to answer the question, pointing out that if A knew which of his fellow 
prisoners were to be set free, then his own probability of being executed 


1. 1 
would rise from 3 to 5 because he would then be one of two prisoners. What 


do you think of the jailer’s reasoning? 
3.47. There is a 30 percent chance that A can fix her busted computer. If A 
cannot, then there is a 40 percent chance that her friend B can fix it. 

a. Find the probability it will be fixed by either A or B. 

b. If it is fixed, what is the probability it will be fixed by B. 


3.48. In any given year, a male automobile policyholder will make a claim with 
probability p,, and a female policyholder will make a claim with probability Des 
where De # Dm: The fraction of the policyholders that are male is a,0 <a < 1. 
A policyholder is randomly chosen. If A; denotes the event that this 


policyholder will make a claim in year i, show that 
P(A2|A1)P(A1) 


Give an intuitive explanation of why the preceding inequality is true. 

3.49. An urn contains 5 white and 10 black balls. A fair die is rolled and that 
number of balls is randomly chosen from the urn. What is the probability that 
all of the balls selected are white? What is the conditional probability that the 
die landed on 3 if all the balls selected are white? 

3.50. Each of 2 cabinets identical in appearance has 2 drawers. Cabinet A 
contains a silver coin in each drawer, and cabinet B contains a silver coin in 
one of its drawers and a gold coin in the other. A cabinet is randomly selected, 


one of its drawers is opened, and a silver coin is found. What is the probability 
that there is a silver coin in the other drawer? 
3.51. Prostate cancer is the most common type of cancer found in males. As 
an indicator of whether a male has prostate cancer, doctors often perform a 
test that measures the level of the prostate-specific antigen (PSA) that is 
produced only by the prostate gland. Although PSA levels are indicative of 
cancer, the test is notoriously unreliable. Indeed, the probability that a 
noncancerous man will have an elevated PSA level is approximately . 135, 
increasing to approximately .268 if the man does have cancer. If, on the basis 
of other factors, a physician is 70 percent certain that a male has prostate 
cancer, what is the conditional probability that he has the cancer given that 

a. the test indicated an elevated PSA level? 

b. the test did not indicate an elevated PSA level? 


Repeat the preceding calculation, this time assuming that the physician 
initially believes that there is a 30 percent chance that the man has prostate 
cancer. 
3.52. Suppose that an insurance company classifies people into one of three 
classes: good risks, average risks, and bad risks. The company’s records 
indicate that the probabilities that good-, average-, and bad-risk persons will 
be involved in an accident over a 1-year span are, respectively, .05, .15, and 
.30. If 20 percent of the population is a good risk, 50 percent an average risk, 
and 30 percent a bad risk, what proportion of people have accidents in a fixed 
year? If policyholder A had no accidents in 2012, what is the probability that 
he or she is a good risk? is an average risk? 
3.53. A worker has asked her supervisor for a letter of recommendation for a 
new job. She estimates that there is an 80 percent chance that she will get the 
job if she receives a strong recommendation, a 40 percent chance if she 
receives a moderately good recommendation, and a 10 percent chance if she 
receives a weak recommendation. She further estimates that the probabilities 
that the recommendation will be strong, moderate, and weak are .7, .2, and .1, 
respectively. 
a. How certain is she that she will receive the new job offer? 
b. Given that she does receive the offer, how likely should she feel that 
she received a strong recommendation? a moderate recommendation? 
a weak recommendation? 
c. Given that she does not receive the job offer, how likely should she feel 
that she received a strong recommendation? a moderate 
recommendation? a weak recommendation? 


3.54. Players A, B, C, D are randomly lined up. The first two players in line 
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then play a game; the winner of that game then plays a game with the person 
who is third in line; the winner of that game then plays a game with the person 
who is fourth in line. The winner of that last game is considered the winner of 
the tournament. If A wins each game it plays with probability p, determine the 
probability that A is the winner of the tournament. 

3.55. Players 1, 2,3 are playing a tournament. Two of these three players are 
randomly chosen to play a game in round one, with the winner then playing 
the remaining player in round two. The winner of round two is the tournament 
victor. Assume that all games are independent and that i wins when playing 


i 
against j with probability iy 


a. Find the probability that 1 is the tournament victor. 
b. If 1 is the tournament victor, find the conditional probability that 1 did 
not play in round one. 


3.56. Suppose there are two coins, with coin 1 landing heads when flipped 
with probability .3 and coin 2 with probability .5. Suppose also that we 
randomly select one of these coins and then continually flip it. Let H; denote 
the event that flip 7, 7 => 1, lands heads. Also, let C; be the event that coin i 
was chosen, i = 1,2. 

a. Find P(H,). 

b. Find P(H2|H;). 

c. Find P(C,|H;). 

d. Find P(H,H3H,|H;). 


3.57. In a 7 game series played with two teams, the first team to win a total of 
4 games is the winner. Suppose that each game played is independently won 
by team A with probability p. 
a. Given that one team leads 3 to 0, what is the probability that it is team 
A that leads. 
b. Given that one team leads 3 to 0, what is the probability that team wins 
the series. 


3.58. A parallel system functions whenever at least one of its components 
works. Consider a parallel system of n components, and suppose that each 


1 
component works independently with probability 5 Find the conditional 


probability that component 1 works given that the system is functioning. 
3.59. If you had to construct a mathematical model for events E and F, as 
described in parts (a) through (e), would you assume that they were 
independent events? Explain your reasoning. 

a. E is the event that a businesswoman has blue eyes, and F is the event 
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that her secretary has blue eyes. 

b. E is the event that a professor owns a car, and F is the event that he is 
listed in the telephone book. 

c. E is the event that a man is under 6 feet tall, and F is the event that he 
weighs more than 200 pounds. 

d. E is the event that a woman lives in the United States, and F is the 
event that she lives in the Western Hemisphere. 

e. E is the event that it will rain tomorrow, and F is the event that it will rain 
the day after tomorrow. 


3.60. In a class, there are 4 first-year boys, 6 first-year girls, and 6 sophomore 
boys. How many sophomore girls must be present if sex and class are to be 
independent when a student is selected at random? 
3.61. Suppose that you continually collect coupons and that there are m 
different types. Suppose also that each time a new coupon is obtained, it is a 
type i coupon with probability p,, i = 1, ...,.m. Suppose that you have just 
collected your nth coupon. What is the probability that it is a new type? 
Hint: Condition on the type of this coupon. 
3.62. A simplified model for the movement of the price of a stock supposes 
that on each day the stock’s price either moves up 1 unit with probability p or 
moves down 1 unit with probability 1 — p. The changes on different days are 
assumed to be independent. 
a. What is the probability that after 2 days the stock will be at its original 
price? 
b. What is the probability that after 3 days the stock’s price will have 
increased by 1 unit? 
c. Given that after 3 days the stock’s price has increased by 1 unit, what 
is the probability that it went up on the first day? 


3.63. Suppose that we want to generate the outcome of the flip of a fair coin, 
but that all we have at our disposal is a biased coin that lands on heads with 


1 
some unknown probability p that need not be equal to 2 Consider the 


following procedure for accomplishing our task: 
1. Flip the coin. 
2. Flip the coin again. 
3. If both flips land on heads or both land on tails, return to step 1. 
4. Let the result of the last flip be the result of the experiment. 


fe) 


. Show that the result is equally likely to be either heads or tails. 
b. Could we use a simpler procedure that continues to flip the coin until 
the last two flips are different and then lets the result be the outcome of 
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the final flip? 


3.64. Independent flips of a coin that lands on heads with probability p are 


made. What is the probability that the first four outcomes are 
a. H,H,H,H? 
b. T,H,H, H? 


c. What is the probability that the pattern H, H, H, H? occurs before the 
pattern T, H, H, H? 


Hint for part (c): How can the pattern H, H, H, H? occur first? 
3.65. The color of a person’s eyes is determined by a single pair of genes. If 
they are both blue-eyed genes, then the person will have blue eyes; if they are 
both brown-eyed genes, then the person will have brown eyes; and if one of 
them is a blue-eyed gene and the other a brown-eyed gene, then the person 
will have brown eyes. (Because of the latter fact, we say that the brown-eyed 
gene is dominant over the blue-eyed one.) A newborn child independently 
receives one eye gene from each of its parents, and the gene it receives from 
a parent is equally likely to be either of the two eye genes of that parent. 
Suppose that Smith and both of his parents have brown eyes, but Smith’s 
sister has blue eyes. 
a. What is the probability that Smith possesses a blueeyed gene? 
b. Suppose that Smith’s wife has blue eyes. What is the probability that 
their first child will have blue eyes? 
c. If their first child has brown eyes, what is the probability that their next 
child will also have brown eyes? 


3.66. Genes relating to albinism are denoted by A and a. Only those people 
who receive the a gene from both parents will be albino. Persons having the 
gene pair A, a are normal in appearance and, because they can pass on the 
trait to their offspring, are called carriers. Suppose that a normal couple has 
two children, exactly one of whom is an albino. Suppose that the nonalbino 
child mates with a person who is known to be a carrier for albinism. 

a. What is the probability that their first offspring is an albino? 

b. What is the conditional probability that their second offspring is an 

albino given that their firstborn is not? 


3.67. Barbara and Dianne go target shooting. Suppose that each of Barbara’s 
shots hits a wooden duck target with probability p,, while each shot of 
Dianne’s hits it with probability p,. Suppose that they shoot simultaneously at 
the same target. If the wooden duck is knocked over (indicating that it was 
hit), what is the probability that 

a. both shots hit the duck? 
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b. Barbara’s shot hit the duck? 


What independence assumptions have you made? 
3.68. A and B are involved in a duel. The rules of the duel are that they are to 
pick up their guns and shoot at each other simultaneously. If one or both are 
hit, then the duel is over. If both shots miss, then they repeat the process. 
Suppose that the results of the shots are independent and that each shot of A 
will hit B with probability p ,, and each shot of B will hit A with probability p,. 
What is 
a. the probability that A is not hit? 
b. the probability that both duelists are hit? 
c. the probability that the duel ends after the nth round of shots? 
d. the conditional probability that the duel ends after the nth round of 
shots given that A is not hit? 
e. the conditional probability that the duel ends after the nth round of 
shots given that both duelists are hit? 


3.69. Assume, as inExample 3h ___, that 64 percent of twins are of the same 
sex. Given that a newborn set of twins is of the same sex, what is the 
conditional probability that the twins are identical? 

3.70. The probability of the closing of the ith relay in the circuits shown in 
Figure 3.5 is given by p,,i = 1, 2, 3, 4, 5. If all relays function 
independently, what is the probability that a current flows between A and B for 
the respective circuits? 

Figure 3.5 Circuits for problem 3.70 
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Hint for (b): Condition on whether relay 3 closes. 
3.71. An engineering system consisting of n components is said to be a k-out- 


of-n system (k < n) if the system functions if and only if at least k of the n 
components function. Suppose that all components function independently of 
one another. 
a. If the ith component functions with probability P;,i = 1, 2, 3, 4, compute 
the probability that a 2-out-of-4 system functions. 
b. Repeat part (a) for a 3-out-of-5 system. 
c. Repeat for a k-out-of-n system when all the P; equal p (that is, 
PSH 1 = 1,2).05 8): 


3.72. In Problem 3.70a ___, find the conditional probability that relays 1 and 2 
are both closed given that a current flows from A to B. 

3.73. A certain organism possesses a pair of each of 5 different genes (which 
we will designate by the first 5 letters of the English alphabet). Each gene 
appears in 2 forms (which we designate by lowercase and capital letters). The 
capital letter will be assumed to be the dominant gene, in the sense that if an 
organism possesses the gene pair xX, then it will outwardly have the 
appearance of the X gene. For instance, if X stands for brown eyes and x for 
blue eyes, then an individual having either gene pair XX or xX will have brown 
eyes, whereas one having gene pair xx will have blue eyes. The characteristic 
appearance of an organism is called its phenotype, whereas its genetic 
constitution is called its genotype. (Thus, 2 organisms with respective 
genotypes aA, bB, cc, dD, ee and AA, BB, cc, DD, ee would have different 
genotypes but the same phenotype.) In a mating between 2 organisms, each 
one contributes, at random, one of its gene pairs of each type. The 5 
contributions of an organism (one of each of the 5 types) are assumed to be 
independent and are also independent of the contributions of the organism’s 
mate. In a mating between organisms having genotypes aA, bB, cC, dD, eE 
and aa, bB, cc, Dd, ee what is the probability that the progeny will (i) 
phenotypically and (ii) genotypically resemble 
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a. the first parent? 

b. the second parent? 
c. either parent? 

d. neither parent? 


3.74. There is a 50-50 chance that the queen carries the gene for hemophilia. 
If she is a carrier, then each prince has a 50-50 chance of having hemophilia. 
If the queen has had three princes without the disease, what is the probability 
that the queen is a carrier? If there is a fourth prince, what is the probability 
that he will have hemophilia? 
3.75. A town council of 7 members contains a steering committee of size 3. 
New ideas for legislation go first to the steering committee and then on to the 
council as a whole if at least 2 of the 3 committee members approve the 
legislation. Once at the full council, the legislation requires a majority vote (of 
at least 4) to pass. Consider a new piece of legislation, and suppose that each 
town council member will approve it, independently, with probability p. What is 
the probability that a given steering committee member’s vote is decisive in 
the sense that if that person’s vote were reversed, then the final fate of the 
legislation would be reversed? What is the corresponding probability for a 
given council member not on the steering committee? 
3.76.Suppose that each child born to a couple is equally likely to be a boy or a 
girl, independently of the sex distribution of the other children in the family. For 
a couple having 5 children, compute the probabilities of the following events: 

a. All children are of the same sex. 

b. The 3 eldest are boys and the others girls. 

c. Exactly 3 are boys. 

d. The 2 oldest are girls. 

e. There is at least 1 girl. 


3.77. A and B alternate rolling a pair of dice, stopping either when A rolls the 
sum 9 or when B rolls the sum 6. Assuming that A rolls first, find the 
probability that the final roll is made by A. 
3.78. In a certain village, it is traditional for the eldest son (or the older son in a 
two-son family) and his wife to be responsible for taking care of his parents as 
they age. In recent years, however, the women of this village, not wanting that 
responsibility, have not looked favorably upon marrying an eldest son. 
a. If every family in the village has two children, what proportion of all 
sons are older sons? 
b. If every family in the village has three children, what proportion of all 
sons are eldest sons? 
Assume that each child is, independently, equally likely to be either a 
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boy ora girl. 


3.79. Suppose that EF and F are mutually exclusive events of an experiment. 
Show that if independent trials of this experiment are performed, then E will 
occur before F with probability P(E) /[P(E) + P(F)]. 
3.80. Consider an unending sequence of independent trials, where each trial 
is equally likely to result in any of the outcomes 1, 2, or 3. Given that outcome 
3 is the last of the three outcomes to occur, find the conditional probability that 
a. the first trial results in outcome 1; 
b. the first two trials both result in outcome 1. 


3.81. A and B play a series of games. Each game is independently won by A 
with probability p and by B with probability 1 — p. They stop when the total 
number of wins of one of the players is two greater than that of the other 
player. The player with the greater number of total wins is declared the winner 
of the series. 

a. Find the probability that a total of 4 games are played. 

b. Find the probability that A is the winner of the series. 


3.82. In successive rolls of a pair of fair dice, what is the probability of getting 
2 sevens before 6 even numbers? 


1 
3.83. In a certain contest, the players are of equal skill and the probability is 5 


that a specified one of the two contestants will be the victor. In a group of 2” 
players, the players are paired off against each other at random. The 2” * 
winners are again paired off randomly, and so on, until a single winner 
remains. Consider two specified contestants, A and B, and define the events 
A;i <n,E by 

A;: Aplaysin exactly i contests 


E: Aand B never play each other 


a. Find P(A;),i = 1,...,n. 
b. Find P(E). 
c. Let P,, = P(E). Show that 


1 2-2/1)" 
SO pte Oe Sai a), 


and use this formula to check the answer you obtained in part (b). 
Hint: Find P(E) by conditioning on which of the events A;,i = 1,....n 
occur. In simplifying your answer, use the algebraic identity 
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For another approach to solving this problem, note that there are a total 
of 2” — 1 games played. 
d. Explain why 2" — 1 games are played. 
Number these games, and let B; denote the event that A and B play 
each other in game i,i = 1,...,2" — 1. 
e. What is P(B;) 
f. Use part (e) to find P(E). 


3.84. An investor owns shares in a stock whose present value is 25. She has 
decided that she must sell her stock if it goes either down to 10 or up to 40. If 
each change of price is either up 1 point with probability .55 or down 1 point 
with probability .45, and the successive changes are independent, what is the 
probability that the investor retires a winner? 
3.85. A and B flip coins. A starts and continues flipping until a tail occurs, at 
which point B starts flipping and continues until there is a tail. Then A takes 
over, and so on. Let P, be the probability of the coin landing on heads when A 
flips and P, when B flips. The winner of the game is the first one to get 

a. 2 heads in a row; 

b. a total of 2 heads; 

c. 3 heads in a row; 

d. a total of 3 heads. 


In each case, find the probability that A wins. 

3.86. Die A has 4 red and 2 white faces, whereas die B has 2 red and 4 white 
faces. A fair coin is flipped once. If it lands on heads, the game continues with 
die A; if it lands on tails, then die B is to be used. 


1 
a. Show that the probability of red at any throw is 3 


b. If the first two throws result in red, what is the probability of red at the 
third throw? 

c. If red turns up at the first two throws, what is the probability that it is die 
A that is being used? 


3.87. An urn contains 12 balls, of which 4 are white. Three players—A, B, and 
C successively draw from the urn, A first, then B, then C, then A, and so on. 
The winner is the first one to draw a white ball. Find the probability of winning 
for each player if 

a. each ball is replaced after it is drawn; 
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b. the balls that are withdrawn are not replaced. 


3.88. Repeat Problem 3.87 when each of the 3 players selects from his 
own urn. That is, suppose that there are 3 different urns of 12 balls with 4 
white balls in each urn. 
3.89. Let S = {1, 2, ...,n} and suppose that A and B are, independently, equally 
likely to be any of the 2” subsets (including the null set and S itself) of S. 

a. Show that 


P{Ac B}= (3) 


Hint: Let N(B) denote the number of elements in B. Use 


P(ACB}= ¥, P(AcBIN(B) = HP(N(B) = 3 


Show that Pan = | = (3) : 


3.90. Consider an eight team tournament with the format given in Figure 
3.6. If the probability that team i beats team j if they play is a, find the 


probability that team 1 wins the tournament. 
Figure 3.6 
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3.91. ConsiderExample 2a __, but now suppose that when the key is ina 
certain pocket, there is a 10 percent chance that a search of that pocket will 
not find the key. Let R and L be, respectively, the events that the key is in the 
right-hand pocket of the jacket and that it is in the left-hand pocket. Also, let Sz 
be the event that a search of the right-hand jacket pocket will be successful in 
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finding the key, and let U; be the event that a search of the left-hand jacket 
pocket will be unsuccessful and, thus, not find the key. Find P(Sp|U;,), the 
conditional probability that a search of the right-hand pocket will find the key 
given that a search of the left-hand pocket did not, by 
a. using the identity 
P(SpU,) 

P(Sp|Uz) = PU, 
determining P(S,U,,) by conditioning on whether or not the key is in the 
right-hand pocket, and determining P(U;,) by conditioning on whether 
or not the key is in the left-hand pocket; 

b. using the identity 
P(SR|U,) = P(Sr|RU,)P(R|UL) + P(Sa|RUL)P(R|U,) 


3.92. InExample 5e __, what is the conditional probability that the ith coin was 
selected given that the first n trials all result in heads? 
3.93. In Laplace’s rule of succession (Example 5e __ ), are the outcomes of 
the successive flips independent? Explain. 
3.94. A person tried by a 3-judge panel is declared guilty if at least 2 judges 
cast votes of guilty. Suppose that when the defendant is in fact guilty, each 
judge will independently vote guilty with probability .7, whereas when the 
defendant is in fact innocent, this probability drops to .2. If 70 percent of 
defendants are guilty, compute the conditional probability that judge number 3 
votes guilty given that 

a. judges 1 and 2 vote guilty; 

b. judges 1 and 2 cast 1 guilty and 1 not guilty vote; 

c. judges 1 and 2 both cast not guilty votes. 


Let E;,i = 1, 2,3 denote the event that judge i casts a guilty vote. Are these 
events independent? Are they conditionally independent? Explain. 
3.95. Each of n workers is independently qualified to do an incoming job with 
probability p. If none of them is qualified then the job is rejected; otherwise the 
job is assigned to a randomly chosen one of the qualified workers. Find the 
probability that worker 1 is assigned to the first incoming job. Hint: Condition 
on whether or not at least one worker is qualified. 
3.96. Suppose in the preceding problem that n = 2 and that worker i is 
qualified with probability p,,i = 1, 2. 
a. Find the probability that worker 1 is assigned to the first incoming job. 
b. Given that worker 1 is assigned to the first job, find the conditional 
probability that worker 2 was qualified for that job. 


3.97. Each member of a population of size n is, independently of other 
members, female with probability p or male with probability 1 — p. Two 
individuals of the same sex will, independently of other pairs, be friends with 
probability a; whereas two individuals of opposite sex will be friends with 
probability 6. Let A, be the event that persons k and r are friends. 

a. Find P(A;>). 

b. Are A; 2 and A, 3 independent. 

c. Are A; and A; 3 conditionally independent given the sex of person 1. 

d. Find P(A, 2413). 


Theoretical Exercises 


3.1. Show that if P(A) > 0, then 
P(AB|A) > P(AB|A U B) 


3.2. Let A c B. Express the following probabilities as simply as possible: 
P(A|B), P(A|B°), P(BIA), P(BIA*) 


3.3. Consider a school community of m families, with n; of them having i 
k 


children, i = 1,...,k, ». n; =m. Consider the following two methods for 
-=4 


choosing a child: 
A. Choose one of the m families at random and then randomly choose a 
child from that family. 
B. Choose one of the )\ _ , in; children at random. 


Show that method 1 is more likely than method 2 to result in the choice of a 
firstborn child. 
Hint: In solving this problem, you will need to show that 


k k k k 
: aj 

DO Nie to ne 

i=1 f=1 i=1 j= 


To do so, multiply the sums and show that for all pairs i, j, the coefficient of 
the term n,n; is greater in the expression on the left than in the one on the 
right. 

3.4. A ball is in any one of n boxes and is in the ith box with probability p,. If 
the ball is in box i, a search of that box will uncover it with probability a;. Show 
that the conditional probability that the ball is in box j, given that a search of 
box i did not uncover it, is 
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fi +i 
1—ap, if j#i 
(1— a )p; 
ifj=i 
a ip; 
320; 
a. Prove that if E and F are mutually exclusive, then 
P(E) 
P(E|E U F) = ——— 
(El ) P(E) + P(F) 


b. Prove that if E;,i = 1 are mutually exclusive, then 
dace P(E;) 
P(Ej| U1 Ei) = <p— 


3.6. Prove that if F,, F2,...,E, are independent events, then 


P(E, UE, U-UE,) =1- [TI [1—-P(ED] 


i=1 


on & 

a. An urn contains n white and m black balls. The balls are withdrawn one 
at a time until only those of the same color are left. Show that with 
probability n/(n + m), they are all white. 

Hint: Imagine that the experiment continues until all the balls are 
removed, and consider the last ball withdrawn. 

b. A pond contains 3 distinct species of fish, which we will call the Red, 
Blue, and Green fish. There are r Red, b Blue, and g Green fish. 
Suppose that the fish are removed from the pond in a random order. 
(That is, each selection is equally likely to be any of the remaining fish.) 
What is the probability that the Red fish are the first species to become 
extinct in the pond? 

Hint: Write P{R} = P{RBG} + P{RGB}, and compute the probabilities on 
the right by first conditioning on the last species to be removed. 


3.8. Let A, B, and C be events relating to the experiment of rolling a pair of 
dice. 
a. If 
P(A|C)>P(B|C) and P(A|C‘°) > P(B|C‘) 


either prove that P(A) > P(B) or give a counterexample by defining 
events A, B, and C for which that relationship is not true. 
b. If 
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P(A|C) > P(A|C° and P(B|C) > P(B|Cc) 


either prove that P(AB|C) > P(AB|C‘) or give a counterexample by 
defining events A, B, and C for which that relationship is not true. 


Hint: Let C be the event that the sum of a pair of dice is 10; let A be the event 
that the first die lands on 6; let B be the event that the second die lands on 6. 
3.9. Consider two independent tosses of a fair coin. Let A be the event that 
the first toss results in heads, let B be the event that the second toss results in 
heads, and let C be the event that in both tosses the coin lands on the same 
side. Show that the events A, B, and C are pairwise independent—that is, A 
and B are independent, A and C are independent, and B and C are 
independent—but not independent. 

3.10. Two percent of women age 45 who participate in routine screening have 
breast cancer. Ninety percent of those with breast cancer have positive 
mammographies. Eight percent of the women who do not have breast cancer 
will also have positive mammographies. Given that a woman has a positive 
mammography, what is the probability she has breast cancer? 

3.11. In each of n independent tosses of a coin, the coin lands on heads with 
probability p. How large need n be so that the probability of obtaining at least 


1 
one head is at least 3? 
3.12. Show that 0 < a; < 1,i = 1,2,..., then 
[oe] i—1 fo) 


Hint: Suppose that an infinite number of coins are to be flipped. Let a; be the 
probability that the ith coin lands on heads, and consider when the first head 
occurs. 
3.13. The probability of getting a head on a single toss of a coin is p. Suppose 
that A starts and continues to flip the coin until a tail shows up, at which point 
B starts flipping. Then B continues to flip until a tail comes up, at which point A 
takes over, and so on. Let P,,,, denote the probability that A accumulates a 
total of n heads before B accumulates m. Show that 

Prim = PPn-1.m + (1—p)(1—-Pmn) 


* 3.14. Suppose that you are gambling against an infinitely rich adversary and 
at each stage you either win or lose 1 unit with respective probabilities p and 
1 — p. Show that the probability that you eventually go broke is 


1 ifp<- 


3pt(q/p) ifp> 5 


188 of 848 


where q = 1 — p and where i is your initial fortune. 

3.15. Independent trials that result in a success with probability p are 
successively performed until a total of r successes is obtained. Show that the 
probability that exactly n trials are required is 


oe(-e) 


Use this result to solve the problem of the points (Example 4j 
Hint: In order for it to take n trials to obtain r successes, how many successes 
must occur in the first n — 1 trials? 
3.16. Independent trials that result in a success with probability p and a failure 
with probability 1 — p are called Bernoulli trials. Let P,, denote the probability 
that n Bernoulli trials result in an even number of successes (0 being 
considered an even number). Show that 

P, =p —Pyes) FO —p)Paw4 11 


and use this formula to prove (by induction) that 
1+(1-2p)" 
n— 2 


3.17. Suppose that n independent trials are performed, with trial i being a 
success with probability 1/(2i + 1). Let P,, denote the probability that the total 
number of successes that result is an odd number. 
a. Find P,, for n = 1, 2, 3,4,5. 
b. Conjecture a general formula for P,. 
c. Derive a formula for P,, in terms of P,_,. 
d. Verify that your conjecture in part (b) satisfies the recursive formula in 
part (c). Because the recursive formula has a unique solution, this then 
proves that your conjecture is correct. 


3.18. Let Q,, denote the probability that no run of 3 consecutive heads appears 
in n tosses of a fair coin. Show that 


1 1 1 
Qn = 7 en-1 + Gen-2 +g en-s 
Q = Q,=2,=1 


Find Q,. 

Hint: Condition on the first tail. 

3.19. Consider the gambler’s ruin problem, with the exception that A and B 
agree to play no more than n games. Let P,,; denote the probability that A 
winds up with all the money when A starts with i and B starts with N — i. 
Derive an equation for P,, ; in terms of P,—1 ;+, and P,-4 ;-1, and compute 
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Po aN = 5, 
3.20. Consider two urns, each containing both white and black balls. The 
probabilities of drawing white balls from the first and second urns are, 
respectively, p and p’. Balls are sequentially selected with replacement as 
follows: With probability a, a ball is initially chosen from the first urn, and with 
probability 1 — a, it is chosen from the second urn. The subsequent selections 
are then made according to the rule that whenever a white ball is drawn (and 
replaced), the next ball is drawn from the same urn, but when a black ball is 
drawn, the next ball is taken from the other urn. Let a@,, denote the probability 
that the nth ball is chosen from the first urn. Show that 
Qn41=a,(p+p'’-1)+1-p’ n2=1 


and use this formula to prove that 


Dp! Leap ; nee 
=> | ee OP ee 
2 pp ( 2S pap Oe ) 


Let P,, denote the probability that the nth ball selected is white. Find P,,. Also, 
compute lim, ., . @, and lim, -, « Pp. 
3.21. The Ballot Problem. |In an election, candidate A receives n votes and 
candidate B receives m votes, where n > m. Assuming that all of the 
(n+ m)!/n!m! orderings of the votes are equally likely, let P,, », denote the 
probability that A is always ahead in the counting of the votes. 

a. Compute P21, P31,P32,P4,1, P42, P13: 

b. Find P,1,Pn2- 

c. On the basis of your results in parts (a) and (b), conjecture the value of 


Pai: 
d. Derive a recursion for P,, , in terms of P,—14m and Py m-1 by 


conditioning on who receives the last vote. 


e. Use part (d) to verify your conjecture in part (c) by an induction proof on 
n+ m, 


3.22. As a simplified model for weather forecasting, suppose that the weather 
(either wet or dry) tomorrow will be the same as the weather today with 
probability p. Show that the weather is dry on January 1, then P,,, the 
probability that it will be dry n days later, satisfies 

P, = (2p—-1)Pp-1+(1-p) n21 

Pp = 1 


Prove that 


1 1 ¥ 


190 of 848 


3.23. A bag contains a white and b black balls. Balls are chosen from the bag 
according to the following method: 

A. A ball is chosen at random and is discarded. 

B. A second ball is then chosen. If its color is different from that of the 
preceding ball, it is replaced in the bag and the process is repeated 
from the beginning. If its color is the same, it is discarded and we start 
from step 2. 


In other words, balls are sampled and discarded until a change in color 
occurs, at which point the last ball is returned to the urn and the process starts 
anew. Let P,, denote the probability that the last ball in the bag is white. 
Prove that 


Hint: Use induction on k=a+ b. 
3.24. A round-robin tournament of n contestants is a tournament in which 


n 
each of the (7) pairs of contestants play each other exactly once, with the 


outcome of any play being that one of the contestants wins and the other 
loses. For a fixed integer k, k < n, a question of interest is whether it is 
possible that the tournament outcome is such that for every set of k players, 
there is a player who beat each member of that set. Show that if 


Oh-G)] 


<1 
then such an outcome is possible. 


Hint: Suppose that the results of the games are independent and that each 
n 
game is equally likely to be won by either contestant. Number the (*) sets of 


k contestants, and let B; denote the event that no contestant beat all of the k 


players in the ith set. Then use Boole’s inequality to bound P( U Bi), 
U 


3.25. Prove directly that 
P(E|F) = P(E| FG)P(G|F) + P(E| FG‘)P(G‘|F) 


3.26. Prove the equivalence of Equations (5.11) and(5.12 — ). 

3.27. Extend the definition of conditional independence to more than 2 events. 
3.28. Prove or give a counterexample. If E, and E, are independent, then they 
are conditionally independent given F. 

3.29. In Laplace’s rule of succession (Example 5e __ ), show that if the first n 
flips all result in heads, then the conditional probability that the next m flips 
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also result in all heads is approximately (n + 1)/(n + m — 1) when k is large. 
3.30. In Laplace’s rule of succession (Example 5e __), suppose that the first n 
flips resulted in r heads and n — r tails. Show that the probability that the 

(n + 1) flip turns up heads is (r + 1)/(n + 2). To do so, you will have to prove 


mm Im! 
y"1-y je 
(n+m+1)! 
0 


4 m 
Hint: To prove the identity, let o(n m) = | (1 — y) dy. Integrating by 
0 


and use the identity 


parts yields 
m 
(n,m) = —¢(n+ 1,m—- 1) 
n+1 


Starting with C(n, 0) = 1/(n + 1), prove the identity by induction on m. 

3.31. Suppose that a nonmathematical, but philosophically minded, friend of 
yours claims that Laplace’s rule of succession must be incorrect because it 
can lead to ridiculous conclusions. “For instance,” says he, “the rule states 


11 
that if a boy is 10 years old, having lived 10 years, the boy has probability PD 
of living another year. On the other hand, if the boy has an 80-year-old 

81 
grandfather, then, by Laplace’s rule, the grandfather has probability 82 of 


surviving another year. However, this is ridiculous. Clearly, the boy is more 
likely to survive an additional year than the grandfather is.” How would you 
answer your friend? 


Self-Test Problems and Exercises 


3.1. Ina game of bridge, West has no aces. What is the probability of 
his partner’s having (a) no aces? (b) 2 or more aces? (c) What would 
the probabilities be if West had exactly 1 ace? 
3.2. The probability that a new car battery functions for more than 
10,000 miles is .8, the probability that it functions for more than 
20,000 miles is .4, and the probability that it functions for more than 
30,000 miles is .1. If a new car battery is still working after 10,000 
miles, what is the probability that 

a. its total life will exceed 20,000 miles? 

b. its additional life will exceed 20,000 miles? 
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3.3. How can 20 balls, 10 white and 10 black, be put into two urns so 
as to maximize the probability of drawing a white ball if an urn is 
selected at random and a ball is drawn at random from it? 
3.4. Urn A contains 2 white balls and 1 black ball, whereas urn B 
contains 1 white ball and 5 black balls. A ball is drawn at random 
from urn A and placed in urn B. A ball is then drawn from urn B. It 
happens to be white. What is the probability that the ball transferred 
was white? 
3.5. An urn has r red and w white balls that are randomly removed 
one ata time. Let R; be the event that the ith ball removed is red. 
Find 

a. P(R;) 

b. P(Rs|R3) 

c. P(R3|Rs) 


3.6. An urn contains b black balls and r red balls. One of the balls is 
drawn at random, but when it is put back in the urn, c additional balls 
of the same color are put in with it. Now, suppose that we draw 
another ball. Show that the probability that the first ball was black, 
given that the second ball drawn was red, is b/(b+r +c). 
3.7. A friend randomly chooses two cards, without replacement, from 
an ordinary deck of 52 playing cards. In each of the following 
situations, determine the conditional probability that both cards are 
aces. 
a. You ask your friend if one of the cards is the ace of spades, 
and your friend answers in the affirmative. 
b. You ask your friend if the first card selected is an ace, and 
your friend answers in the affirmative. 
c. You ask your friend if the second card selected is an ace, and 
your friend answers in the affirmative. 
d. You ask your friend if either of the cards selected is an ace, 
and your friend answers in the affirmative. 


3.8. Show that 
P(H|E) _ P(H) P(E|#) 
P(G|E) — P(G) P(E|G) 


Suppose that, before new evidence is observed, the hypothesis H is 
three times as likely to be true as is the hypothesis G. If the new 
evidence is twice as likely when G is true than it is when H is true, 
which hypothesis is more likely after the evidence has been 
observed? 


3.9. You ask your neighbor to water a sickly plant while you are on 
vacation. Without water, it will die with probability .8; with water, it will 
die with probability .15. You are 90 percent certain that your neighbor 
will remember to water the plant. 
a. What is the probability that the plant will be alive when you 
return? 
b. If the plant is dead upon your return, what is the probability 
that your neighbor forgot to water it? 


3.10. Six balls are to be randomly chosen from an urn containing 8 
red, 10 green, and 12 blue balls. 
a. What is the probability at least one red ball is chosen? 
b. Given that no red balls are chosen, what is the conditional 
probability that there are exactly 2 green balls among the 6 
chosen? 


3.11. A type C battery is in working condition with probability .7, 
whereas a type D battery is in working condition with probability .4.A 
battery is randomly chosen from a bin consisting of 8 type C and 6 
type D batteries. 
a. What is the probability that the battery works? 
b. Given that the battery does not work, what is the conditional 
probability that it was a type C battery? 


3.12. Maria will take two books with her on a trip. Suppose that the 
probability that she will like book 1 is .6, the probability that she will 
like book 2 is .5, and the probability that she will like both books is 
.4, Find the conditional probability that she will like book 2 given that 
she did not like book 1. 

3.13. Balls are randomly removed from an urn that initially contains 
20 red and 10 blue balls. 

a. What is the probability that all of the red balls are removed 
before all of the blue ones have been removed? Now suppose 
that the urn initially contains 20 red, 10 blue, and 8 green 
balls. 

b. Now what is the probability that all of the red balls are 
removed before all of the blue ones have been removed? 

c. What is the probability that the colors are depleted in the order 
blue, red, green? 

d. What is the probability that the group of blue balls is the first of 
the three groups to be removed? 
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3.14. A coin having probability .8 of landing on heads is flipped. A 
observes the result—either heads or tails—and rushes off to tell B. 
However, with probability .4, A will have forgotten the result by the 
time he reaches B. If A has forgotten, then, rather than admitting this 
to B, he is equally likely to tell B that the coin landed on heads or that 
it landed tails. (If he does remember, then he tells B the correct 
result.) 
a. What is the probability that B is told that the coin landed on 
heads? 
b. What is the probability that B is told the correct result? 
c. Given that B is told that the coin landed on heads, what is the 
probability that it did in fact land on heads? 


3.15. In a certain species of rats, black dominates over brown. 
Suppose that a black rat with two black parents has a brown sibling. 
a. What is the probability that this rat is a pure black rat (as 
opposed to being a hybrid with one black and one brown 
gene)? 

b. Suppose that when the black rat is mated with a brown rat, all 
5 of their offspring are black. Now what is the probability that 
the rat is a pure black rat? 


3.16. 
a. In Problem 3.70b __, find the probability that a current flows 
from A to B, by conditioning on whether relay 1 closes. 
b. Find the conditional probability that relay 3 is closed given that 
a current flows from A to B. 


3.17. For the k-out-of-n system described in Problem 3.71 
assume that each component independently works with probability 


1 
* Find the conditional probability that component 1 is working, given 


that the system works, when 
a.k =1,n=2; 
bk=2,n=3. 


3.18. Mr. Jones has devised a gambling system for winning at 
roulette. When he bets, he bets on red and places a bet only when 
the 10 previous spins of the roulette have landed on a black number. 
He reasons that his chance of winning is quite large because the 
probability of 11 consecutive spins resulting in black is quite small. 
What do you think of this system? 
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3.19. Three players simultaneously toss coins. The coin tossed by 
A(B)[C] turns up heads with probability P;(P,)[P 3]. If one person 
gets an outcome different from those of the other two, then he is the 
odd man out. If there is no odd man out, the players flip again and 
continue to do so until they get an odd man out. What is the 
probability that A will be the odd man? 

3.20. Suppose that there are n possible outcomes of a trial, with 


n 
outcome i resulting with probability p,,i = 1,....n, ve p, = 1. If two 
i=1 
independent trials are observed, what is the probability that the result 
of the second trial is larger than that of the first? 
3.21. If A flips n + 1 and B flips n fair coins, show that the probability 


1 
that A gets more heads than B is 3 


Hint: Condition on which player has more heads after each has 
flipped n coins. (There are three possibilities. ) 
3.22. Prove or give counterexamples to the following statements: 
a. If E is independent of F and E is independent of G, then E is 
independent of FUG. 
b. If E is independent of F, and E is independent of G, and 
FG = @, then E is independent of FUG. 
c. If E is independent of F, and F is independent of G, and E is 
independent of FG, then G is independent of EF. 


3.23. Let A and B be events having positive probability. State 
whether each of the following statements is (i) necessarily true, (ii) 
necessarily false, or (iii) possibly true. 
a. If A and B are mutually exclusive, then they are independent. 
b. If A and B are independent, then they are mutually exclusive. 
c. P(A) = P(B) = .6, and A and B are mutually exclusive. 
d. P(A) = P(B) = .6, and A and B are independent. 


3.24. Rank the following from most likely to least likely to occur: 
A. A fair coin lands on heads. 
B. Three independent trials, each of which is a success with 
probability .8, all result in successes. 
C. Seven independent trials, each of which is a success with 
probability .9, all result in successes. 


3.25. Two local factories, A and B, produce radios. Each radio 
produced at factory A is defective with probability .05, whereas each 


one produced at factory B is defective with probability .01. Suppose 
you purchase two radios that were produced at the same factory, 
which is equally likely to have been either factory A or factory B. If 
the first radio that you check is defective, what is the conditional 
probability that the other one is also defective? 

3.26. Show that if P(A|B) = 1, then P(B‘| A‘) = 1. 

3.27. An urn initially contains 1 red and 1 blue ball. At each stage, a 
ball is randomly withdrawn and replaced by two other balls of the 
same color. (For instance, if the red ball is initially chosen, then there 
would be 2 red and 1 blue balls in the urn when the next selection 
occurs.) Show by mathematical induction that the probability that 
there are exactly i red balls in the urn after n stages have been 


completed is 1 isisn+ 1 


n+ 
3.28. A total of 2n cards, of which 2 are aces, are to be randomly 
divided among two players, with each player receiving n cards. Each 
player is then to declare, in sequence, whether he or she has 
received any aces. What is the conditional probability that the second 
player has no aces, given that the first player declares in the 
affirmative, when (a) n = 2? (b) n = 10? (c) n = 100? To what does 
the probability converge as n goes to infinity? Why? 
3.29. There are n distinct types of coupons, and each coupon 
obtained is, independently of prior types collected, of type i with 
probability p,, 5; — , Pp; = 1. 
a. If n coupons are collected, what is the probability that one of 
each type is obtained? 
b. Now suppose that p, = p, =... = p,, = 1/n. Let FE; be the 
event that there are no type i coupons among the n collected. 
Apply the inclusion—exclusion identity for the probability of the 
union of events to P( U; E;) to prove the identity 


n= y cot Hen k)" 


k=0 


3.30. Show that for any events E and F, 
P(E|EUF) > P(E|F) 


Hint: Compute P(E | E U F) by conditioning on whether F occurs. 
3.31. 
a. If the odds of A is 2/3, what is the probability that A occurs. 
b. If the odds of A is 5, what is the probability that A occurs. 
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3.32. A fair coin is flipped 3 times. Let FE be the event that all flips 
land heads. 
a. What is the odds of the event E. 
b. What is the conditional odds of the event E given that at least 
one of the coins landed heads. 


3.33. If the events E, F,G are independent, show that 

P(E|FG‘) = P(E). 

3.34. Players 1, 2,3, are in a contest. Two of them are randomly 
chosen to play a game in round one, with the winner then playing the 
remaining player in round two. The winner of the round two game is 
the winner of the contest. Assuming that all games are independent 


i 
and that i wins when playing against j with probability i+] find the 


probability that 1 is the winner of the contest. Given that 1 is the 
winner, what is the conditional probability that 1 did not play in the 
first round. 

3.35. If 4 balls are randomly chosen from an urn containing 4 red, 5 
white, 6 blue, and 7 green balls, find the conditional probability they 
are all white given that all balls are of the same color. 

3.36. In a 4 player tournament, player 1 plays player 2, player 3 plays 
player 4, with the winners then playing for the championship. 
Suppose that a game between player i and player j is won by player 


i with probability ray" Find the probability that player 1 wins the 


championship. 

3.37. In a tournament involving players 1,...., players 1 and 2 playa 
game, with the loser departing and the winner then playing against 
player 3, with the loser of that game departing and the winner then 
playing player 4, and so on. The winner of the game against player n 
is the tournament winner. Suppose that a game between players i 


i 
and j is won by player i with probability ii 


a. Find the probability that player 3 is the tournament winner. 
b. If n = 4, find the probability that player 4 is the tournament 
winner. 
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4.1 Random Variables 


When an experiment is performed, we are frequently interested mainly in some 
function of the outcome as opposed to the actual outcome itself. For instance, in 
tossing dice, we are often interested in the sum of the two dice and are not really 
concerned about the separate values of each die. That is, we may be interested in 
knowing that the sum is 7 and may not be concerned over whether the actual 
outcome was (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), or (6, 1). Also, in flipping a coin, we 
may be interested in the total number of heads that occur and not care at all about 
the actual head-tail sequence that results. These quantities of interest, or, more 
formally, these real-valued functions defined on the sample space, are known as 
random variables. 


Because the value of a random variable is determined by the outcome of the 
experiment, we may assign probabilities to the possible values of the random 
variable. 


Example 1a 


Suppose that our experiment consists of tossing 3 fair coins. If we let Y denote 
the number of heads that appear, then Y is a random variable taking on one of 
the values 0, 1, 2, and 3 with respective probabilities 
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P{Y =0} = P{(tt,0)}= ; 

PLY a4). = Plt hj Ght) ep} = ; 
P{Y =2} = P{(t,h,h),(h,t,h), (h,h,t)} = : 
P{Y =3} = P{(A,h,h)} = ; 


Since Y must take on one of the values 0 through 3, we must have 


3 


i=e( U_t=a)- = PLY =i} 


i=0 
which, of course, is in accord with the preceding probabilities. 


Example 1b 


A life insurance agent has 2 elderly clients, each of whom has a life insurance 
policy that pays $100,000 upon death. Let Y be the event that the younger one 
dies in the following year, and let O be the event that the older one dies in the 
following year. Assume that Y and O are independent, with respective 
probabilities P(Y) = .05 and P(O) = .10. If X denotes the total amount of money 
(in units of $100, 000) that will be paid out this year to any of these clients’ 
beneficiaries, then X is a random variable that takes on one of the possible 
values 0,1, 2 with respective probabilities 


P{xX =0} = P(Y‘O‘) = P(Y‘)P(O‘) = (.95)(.9) = .855 
P{X =1} = P(YO‘) + P(Y‘O) = (.05)(.9) + (.95)(.1) = .140 
P{X =2} = P(YO) = (.05)(.1) = .005 

Example 1c 


Four balls are to be randomly selected, without replacement, from an urn that 
contains 20 balls numbered 1 through 20. If X is the largest numbered ball 
selected, then X is a random variable that takes on one of the values 4,5, ...,20. 


20 
Because each of the ( i possible selections of 4 of the 20 balls is equally likely, 


the probability that X takes on each of its possible values is 
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This is so because the number of selections that result in X = i is the number of 
selections that result in ball numbered i and three of the balls numbered 1 


1 
through i — 1 being selected. As there are ()( S such selections, the 


preceding equation follows. 


Suppose now that we want to determine P{X > 10}. One way, of course, is to just 
use the preceding to obtain 


P{X > 10} = S P{X =i} = S or 
— Cran, Mel 


However, a more direct approach for determining P(X > 10) would be to use 


(i) 


P{X > 10} =1— P{Xs10}=1-7 

() 

where the preceding results because X will be less than or equal to 10 when the 
4 balls chosen are among balls numbered 1 through 10. 


Example 1d 


Independent trials consisting of the flipping of a coin having probability p of 
coming up heads are continually performed until either a head occurs or a total of 
n flips is made. If we let X denote the number of times the coin is flipped, then X 
is a random variable taking on one of the values 1, 2, 3, ...,n with respective 
probabilities 
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P{X = 1} 


P{X = 2} - 
P{X = 3} = 
P{X=n-1} = 
P{X = n} = 


As a check, note that 


Example 1e 


P{h} = p 
P{(t,h)} = (1—p)p 
P{(t,t,h)} = (1—p)’p 


See eee 


n 


Kee 
II 


Pl Cp sae) €; | — (1 a yp)” *p 


—2 


SS 


=". n—1 


Pl Put DAE Eat | =(1-p)""? 


n 


y P{xX =i 


>, p-p)*+0-p)"4 


i=1 


fp) Pe 
ea a 
Paap ip) 


1 


Suppose that there are r distinct types of coupons and that each time one 
obtains a coupon, it is, independently of previous selections, equally likely to be 


any one of the r types. One random variable of interest is T, the number of 


coupons that need to be collected until one obtains a complete set of at least one 


of each type. Rather than derive P{T = n} directly, let us start by considering the 


probability that T is greater than n. To do so, fix n and define the events 


A;, A2,..., A; as follows: A; is the event that no type j coupon is contained among 
the first n coupons collected, j = 1, ....r. Hence, by the inclusion-exclusion 


identity 


P{T >n} 


p( U4) 


= DPA) = 2s. RPA Ay res 
j 


Jj, < J, 


i ane ie y . EL Az Ap Aj) 


sae a a < Jp 


+(= 1)" *P(A1 Ap Ar) 
Now, A; will occur if each of the n coupons collected is not of type j. Since each 


of the coupons will not be of type j with probability (r — 1)/r, we have, by the 
assumed independence of the types of successive coupons, 


r—1\" 
rane 


Also, the event Aj, Aj, will occur if none of the first n coupons collected is of 


either type j, or type j,. Thus, again using independence, we see that 


r—2\" 


Yr 


The same reasoning gives 


n 
r—k 
P(Aj, Aj," Aj, ) -( r 


and we see that for n > 0, 


(1.1) 


P{T > n} 


I] 
R 
a" o™~N 

b, 
| 
ray 
SS 
3 
| 
ae 
NR 
Sie 
"oN 
se 
2] | 
NO 
SS 
3 
+ 
Fa. 
WwW oR 
Soe 
aS 
a 
2] | 
vO 
SS 
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The probability that T equals n can now be obtained from the preceding formula 
by the use of 


P{T >n—-1}= P{T =n}+P{T>n} 
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or, equivalently, 


P{T =n} =P{T >n-1}-P{T>n} 


Another random variable of interest is the number of distinct types of coupons 
that are contained in the first n selections—call this random variable D,,. To 
compute P{D,, = k}, let us start by fixing attention on a particular set of k distinct 
types, and let us then determine the probability that this set constitutes the set of 
distinct types obtained in the first n selections. Now, in order for this to be the 
situation, it is necessary and sufficient that of the first n coupons obtained, 


A: each is one of these k types 


B: each of these k types is represented 


Now, each coupon selected will be one of the k types with probability k/r, so the 
probability that A will be valid is (k/r)". Also, given that a coupon is of one of the 
k types under consideration, it is easy to see that it is equally likely to be of any 
one of these k types. Hence, the conditional probability of B given that A occurs 
is the same as the probability that a set of n coupons, each equally likely to be 
any of k possible types, contains a complete set of all k types. But this is just the 
probability that the number needed to amass a complete set, when choosing 
among k types, is less than or equal to n and is thus obtainable from Equation 
(1.1) with k replacing r. Thus, we have 


P(A) 


II 
fo 
R1 
ape 4 

3 


P(BIA) 


IP 


Tr 
Finally, as there are ({) possible choices for the set of k types, we arrive at 


P{Dn =k} = (; (42) 
(OA P-  Oor 


Remark We can obtain a useful bound on 
PT >n) = P( Uj, 4;) 


by using Boole’s inequality along with the inequality e * > 1 — x. 
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P(T>n) = P( Uj-, A;) 


3 P(A;) 


j= 


--5) 


< re nr 


lA 


The first inequality is Boole’s inequality, which says that the probability of the 
union of events is always less than or equal to the sum of the probabilities of 
these events, and the last inequality uses that e~1/" > 1—1/r. 


For a random variable X, the function F defined by 
F(x) = P{X < x} — 0 <x< © 
is called the cumulative distribution function or, more simply, the distribution function 


of X. Thus, the distribution function specifies, for all real values x, the probability that 
the random variable is less than or equal to x. 


Now, suppose that a < b. Then, because the event {X < a} is contained in the event 
{X < b}, it follows that F(a), the probability of the former, is less than or equal to F(b), 
the probability of the latter. In other words, F(x) is a nondecreasing function of x. 
Other general properties of the distribution function are given in Section 4.10 


4.2 Discrete Random Variables 


A random variable that can take on at most a countable number of possible values is 
said to be discrete. For a discrete random variable X, we define the probability mass 
function p(a) of X by 


p(a) = P{X =a} 


The probability mass function p(a) is positive for at most a countable number of 
values of a. That is, if X must assume one of the values x,, X,..., then 


p(x;) 20 fori=1,2,... 


p(x) =0 forall other values of x 


Since X must take on one of the values x;, we have 
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p(x) =1 


Il [\“8 


It is often instructive to present the probability mass function in a graphical format by 
plotting p(x;) on the y-axis against x; on the x-axis. For instance, if the probability 
mass function of X is 


1 1 1 
PO)=7 PA)=5 P=% 


we can represent this function graphically as shown in Figure 4.1 __. Similarly, a 
graph of the probability mass function of the random variable representing the sum 
when two dice are rolled looks like Figure 4.2 


Figure 4.1 


P(x) 


NI 


Rin 


Figure 4.2 
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p(x) 


Example 2a 


The probability mass function of a random variable xX is given by 

p(i) = ca'/i!, i= 0,1,2,..., where A is some positive value. Find (a) P{X = 0} and 
(b) P{X > 2}. 

Solution 


Since ». p(i) = 1, we have 
i=0 


which, because e* = ». x'/i!, implies that 


Il 
oO 


ce =itorc=e* 
Hence, 
a. Pix=0)ee"4°/ol=e4 
P{X>2} = 1—-P{X<2}=1-P{x=0}-P{x=1} 
7 —P{X = 2} 
A*e-4 
= 1-e 4-jd’e4*- 5 
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The cumulative distribution function F can be expressed in terms of p(a) by 


Fa@= ) pe) 
al x Sa 


If X is a discrete random variable whose possible values are x1, x2, 3, ..., where 

X4 <X2<X3 <-+:, then the distribution function F of X is a step function. That is, the 
value of F is constant in the intervals (x;—,,x;) and then takes a step (or jump) of 
size p(x;) at x;. For instance, if X has a probability mass function given by 


1 1 1 1 
PO =5 P25 P=— 24>, 


then its cumulative distribution function is 


F(a) = 


P COIN BIW BIR OO 


This function is depicted graphically in Figure 4.3 


Figure 4.3 
F(a) 


ois. 


lw 
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Note that the size of the step at any of the values 1, 2, 3, and 4 is equal to the 
probability that X assumes that particular value. 


4.3 Expected Value 


One of the most important concepts in probability theory is that of the expectation of 
a random variable. If X is a discrete random variable having a probability mass 
function p(x), then the expectation, or the expected value, of X, denoted by E[X], is 
defined by 


EX]= > xp) 


x:p(x)> 0 


In words, the expected value of X is a weighted average of the possible values that X 
can take on, each value being weighted by the probability that X¥ assumes it. For 
instance, on the one hand, if the probability mass function of X is given by 


1 
p(0) = a= p(1) 


then 


is just the ordinary average of the two possible values, 0 and 1, that X can assume. 
On the other hand, if 


1 2 
p(0) = 3 p(1) = 3 


then 


is a weighted average of the two possible values 0 and 1, where the value 1 is given 
twice as much weight as the value 0, since p(1) = 2p(0). 


Another motivation of the definition of expectation is provided by the frequency 
interpretation of probabilities. This interpretation (partially justified by the strong law 
of large numbers, to be presented in Chapter 8 _) assumes that if an infinite 
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sequence of independent replications of an experiment is performed, then, for any 
event E,, the proportion of time that E occurs will be P(E). Now, consider a random 
variable X that must take on one of the values x,, Xz, ...x,, with respective probabilities 
P(X1),p(X2), --»P(%), and think of X as representing our winnings in a single game of 
chance. That is, with probability p(x;), we shall win x; units i = 1,2, ...,.n. By the 
frequency interpretation, if we play this game continually, then the proportion of time 
that we win x; will be p(x;). Since this is true for all i,i = 1, 2, ...,n, it follows that our 
average winnings per game will be 


Example 3a 
Find E[X|, where X is the outcome when we roll a fair die. 


Solution 


Since p(1) = p(2) = p(3) = p(4) = pS) = p(6) = 7 we obtain 


wae) ea) aG)oG)ooQ) 


Example 3b 


We say that / is an indicator variable for the event A if 


, if A occurs 


0 if A‘ occurs 
Find E[I]. 
Solution 
Since p(1) = P(A), p(0) = 1 — P(A), we have 


EI] = P(A) 


That is, the expected value of the indicator variable for the event A is equal to the 
probability that A occurs. 


Example 3c 


A contestant on a quiz show is presented with two questions, questions 1 and 2, 
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which he is to attempt to answer in some order he chooses. If he decides to try 
question i first, then he will be allowed to go on to question j, j # i, only if his 
answer to question i is correct. If his initial answer is incorrect, he is not allowed 
to answer the other question. The contestant is to receive V; dollars if he 
answers question i correctly, i = 1,2. For instance, he will receive V, + V, dollars 
if he answers both questions correctly. If the probability that he knows the answer 
to question i is P;,i = 1,2, which question should he attempt to answer first so as 
to maximize his expected winnings? Assume that the events E,,i = 1,2, that he 
knows the answer to question i are independent events. 


Solution 


On the one hand, if he attempts to answer question 1 first, then he will win 
0 with probability 1 — P, 
Vi with probability P,(1 — Pz) 
V,+V, withprobability P,P, 
Hence, his expected winnings in this case will be 
ViP11— Pa) + Vi 4+V2)PiPo 
On the other hand, if he attempts to answer question 2 first, his expected 
winnings will be 
V2P2(1 — Py) + Vi + V2)PiP2 
Therefore, it is better to try question 1 first if 
ViP,(1— Pz) 2 VoP201 — P,) 


or, equivalently, if 


ViP1 _ VoPe 
f=. - te: 


For example, if he is 60 percent certain of answering question 1, worth $200, 
correctly and he is 80 percent certain of answering question 2, worth $100, 
correctly, then he should attempt to answer question 2 first because 


(100)(.8) | (200)(.6) _ 


400 = 
2 4 


300 


Example 3d 
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A school class of 120 students is driven in 3 buses to a symphonic performance. 
There are 36 students in one of the buses, 40 in another, and 44 in the third bus. 
When the buses arrive, one of the 120 students is randomly chosen. Let X 
denote the number of students on the bus of that randomly chosen student, and 
find E[X]. 


Solution 


Since the randomly chosen student is equally likely to be any of the 120 
students, it follows that 


36 40 44 
P{X = 36} =—— P{xX = 40} =———__ P{x = 44} = — 
120 120 120 


Hence, 


E[X] = 36 : + 40 + 44 E ENE 40 2667 
= 10 3 30) 30 © 


However, the average number of students on a bus is 120/3 = 40, showing that 
the expected number of students on the bus of a randomly chosen student is 
larger than the average number of students on a bus. This is a general 
phenomenon, and it occurs because the more students there are on a bus, the 
more likely it is that a randomly chosen student would have been on that bus. As 
a result, buses with many students are given more weight than those with fewer 
students. (See Self-Test Problem 4.4) 


Remark The probability concept of expectation is analogous to the physical concept 
of the center of gravity of a distribution of mass. Consider a discrete random variable 
X having probability mass function p(x;), i = 1. If we now imagine a weightless rod in 
which weights with mass p(x;),i = 1, are located at the points x,,i = 1 (see Figure 
4.4 _), then the point at which the rod would be in balance is known as the center of 
gravity. For those readers acquainted with elementary statics, it is now a simple 
matter to show that this point is at E[X].t 


Figure 4.4 


_. oe @® @ 


—] 0) Al 2 
p(-1) = .10, p(O) = .25, p(1) = .30, p(2) = .35 


A = center of gravity = .9 
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TTo prove this, we must show that the sum of the torques tending to 
turn the point around E[X] is equal to 0. That is, we must show that 


j= >. @ — E[X])p(x;), which is immediate. 


4.4 Expectation of a Function of a Random 
Variable 


Suppose that we are given a discrete random variable along with its probability mass 
function and that we want to compute the expected value of some function of X, say, 
g(X). How can we accomplish this? One way is as follows: Since g(X) is itself a 
discrete random variable, it has a probability mass function, which can be 
determined from the probability mass function of X. Once we have determined the 
probability mass function of g(X), we can compute E[g(X)] by using the definition of 
expected value. 


Example 4a 


Let X denote a random variable that takes on any of the values —1, 0, and 1 with 
respective probabilities 


P(X= —-1}=.2 P{X=0}=.5 P{X=1}=.3 
Compute E[X?]. 


Solution 


Let Y = X*. Then the probability mass function of Y is given by 


P{Y=1} = P{X= -1}+P{xX=1}=.5 
P{Y=0} = P{xX=0}=.5 


Hence, 


E[X?] = E[Y] =1(.5) + 0(.5) =.5 


Note that 


.5 = E[X’] # (E[X])* = .01 


Although the preceding procedure will always enable us to compute the expected 
value of any function of X from a knowledge of the probability mass function of xX, 
there is another way of thinking about E[g(X)]: Since g(X) will equal g(x) whenever 
X is equal to x, it seems reasonable that E[g(X)| should just be a weighted average 
of the values g(x), with g(x) being weighted by the probability that X is equal to x. 
That is, the following result is quite intuitive. 


Proposition 4.1 


If X is a discrete random variable that takes on one of the values x,,i = 1, with 
respective probabilities p(x;), then, for any real-valued function g, 


E[g(X)] = > g(x) 


Before proving this proposition, let us check that it is in accord with the results of 
Example 4a __. Applying it to that example yields 


E{x?} = (-1)7(2) +07(.5) + 17(.3) 
= 1(2+.3)+0(5) 
5 


which is in agreement with the result given in Example 4a 


Proof of Proposition 4.1 The proof of Proposition 4.1 proceeds, as in the 


preceding verification, by grouping together all the terms in >, 9d) having the 
i 


same value of g(x;). Specifically, suppose that Vpi2 1, represent the different 
values of g(x;),i = 1. Then, grouping all the g(x;) having the same value gives 


> > gore 


jigli)=y, 


= ». ». y P(X) 


jig(@i)=y, 


>»; » p(x;) 
j 


i:g(xi)=y, 


>, 9@)PO0) 


>. ¥/P{g =yi} 
j 


Elg(X)] 


Example 4b 
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A product that is sold seasonally yields a net profit of b dollars for each unit sold 
and a net loss of ¢ dollars for each unit left unsold when the season ends. The 
number of units of the product that are ordered at a specific department store 
during any season is a random variable having probability mass function 

p(i),i = 0. If the store must stock this product in advance, determine the number 
of units the store should stock so as to maximize its expected profit. 


Solution 


Let X denote the number of units ordered. If s units are stocked, then the profit— 
call it P(s)—-can be expressed as 


P(s) =bX-(s—-X)f ifX<s 
= sb ifX >s 
Hence, the expected profit equals 


s ee) 


> [bi — (s — D£]p(é) + >. sbp(i) 


i=0 i=sti1 
= (b+2) > ip(i) — sé = p(i) + sbl1 — ». n(i) 
i=0 i=0 i=0 


= (b+?) > ip(i) — (b+ 2)s > p(i) + sb 
i=0 i=0 


E[P(s)| 


= sh+(b+#) v2 (i — s)p(i) 
i=0 


To determine the optimum value of s, let us investigate what happens to the profit 
when we increase s by 1 unit. By substitution, we see that the expected profit in 
this case is given by 


st1 
b(s+1)+(b+2) > (i—s —Dp(i) 
i=0 


E[P(s + 1)] 


b(s+1)+(b+2) >. (i-s—DoW 
i=0 


Therefore, 
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E[P(s + 1)]—E[P(s)] =b—(b+ @) = n(i) 
i=0 


Thus, stocking s + 1 units will be better than stocking s units whenever 


(4.1) 


x p< 


Because the left-hand side of Equation (4.1) is increasing in s while the right- 
hand side is constant, the inequality will be satisfied for all values of s < s’, 
where s° is the largest value of s satisfying Equation (4.1) . Since 


E[P(0)| < --- < E[P(s*)] < E[P(s* + 1)] > E[P(s* + 2)] > 


it follows that stocking s* + 1 items will lead to a maximum expected profit. 


Example 4c Utility 


Suppose that you must choose one of two possible actions, each of which can 
result in any of n consequences, denoted as C,,...,C,,. Suppose that if the first 
action is chosen, then consequence C; will result with probability p,,i = 1,...,n, 
whereas if the second action is chosen, then consequence C; will result with 


n n 

probability q,,i = 1,....n, where ». D,= = q,; = 1. The following approach 

i=1 i=1 
can be used to determine which action to choose: Start by assigning numerical 
values to the different consequences. First, identify the least and the most 
desirable consequences—call them c and C, respectively; give consequence c 
the value 0 and give C the value 1. Now consider any of the other n — 2 
consequences, say, C;. To value this consequence, imagine that you are given 
the choice between either receiving C; or taking part in a random experiment that 
either earns you consequence C with probability wu or consequence c with 
probability 1 — wu. Clearly, your choice will depend on the value of u. On the one 
hand, if wu = 1, then the experiment is certain to result in consequence C, and 
since C is the most desirable consequence, you will prefer participating in the 
experiment to receiving C;. On the other hand, if wu = 0, then the experiment will 
result in the least desirable consequence—namely, c—so in this case you will 
prefer the consequence C; to participating in the experiment. Now, as wu 
decreases from 1 to 0, it seems reasonable that your choice will at some point 
switch from participating in the experiment to the certain return of C;, and at that 


critical switch point you will be indifferent between the two alternatives. Take that 
indifference probability was the value of the consequence C;. In other words, the 
value of C; is that probability w such that you are indifferent between either 
receiving the consequence C; or taking part in an experiment that returns 
consequence C with probability u or consequence c with probability 1 — u. We 
call this indifference probability the utility of the consequence C;, and we 
designate it as u(C;). 


To determine which action is superior, we need to evaluate each one. Consider 
the first action, which results in consequence C; with probability p,,i = 1,....n. We 
can think of the result of this action as being determined by a two-stage 
experiment. In the first stage, one of the values 1, ...,n is chosen according to the 
probabilities p,,...,p,,; if value i is chosen, you receive consequence C;. However, 
since C; is equivalent to obtaining consequence C with probability w(C;) or 
consequence c with probability 1 — u(C;), it follows that the result of the two- 
stage experiment is equivalent to an experiment in which either consequence C 
or consequence c is obtained, with C being obtained with probability 


_ p,u(ci) 


i=1 


Similarly, the result of choosing the second action is equivalent to taking part in 
an experiment in which either consequence C or consequence c is obtained, with 
C being obtained with probability 


y q,u(Ci) 
i=1 


Since C is preferable to c, it follows that the first action is preferable to the second 
action if 


y p,u(Ci) > y q,u(C;) 
i=1 i=1 


In other words, the worth of an action can be measured by the expected value of 
the utility of its consequence, and the action with the largest expected utility is the 
most preferable. 


A simple logical consequence of Proposition 4.1 is Corollary 4.1 


Corollary 4.1 
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If a and b are constants, then 


E|aX + b] = aE[X] +b 


Proof 
ElaX+b] = (ax + b)p(x) 
x:p(x)> 0 
=a > xp(x) +b > p(x) 
x:p(x)> 0 x:p(x)> 0 


aE|X| +b 


The expected value of a random variable X, E|X], is also referred to as the mean or 
the first moment of X. The quantity E[X"],n = 1, is called the nth moment of X. By 
Proposition 4.1, we note that 


EX" = x(x) 


x:p(x)> 0 


4.5 Variance 


Given a random variable X along with its distribution function F, it would be extremely 
useful if we were able to summarize the essential properties of F by certain suitably 
defined measures. One such measure would be E[X], the expected value of X. 
However, although E[X] yields the weighted average of the possible values of X, it 
does not tell us anything about the variation, or spread, of these values. For 
instance, although random variables W, Y, and Z having probability mass functions 
determined by 


W = Owith probability 1 


1 
—1 with probability 3 


1 
+1 with probability 3 


1 
—100 with probability 3 


1 
+100 with probability 3 


all have the same expectation—namely, 0O—there is a much greater spread in the 


217 of 848 


218 of 848 


possible values of Y than in those of W (which is a constant) and in the possible 
values of Z than in those of Y. 


Because we expect X to take on values around its mean E[X], it would appear that a 
reasonable way of measuring the possible variation of X would be to look at how far 
apart X would be from its mean, on the average. One possible way to measure this 
variation would be to consider the quantity E[ |X — u||, where u = E[X]. However, it 
turns out to be mathematically inconvenient to deal with this quantity, so a more 
tractable quantity is usually considered—namely, the expectation of the square of the 
difference between X and its mean. We thus have the following definition. 


Definition 
If X is a random variable with mean y, then the variance of X, denoted by Var 
(X), is defined by 


Var(X) = E[(X — #)*] 
An alternative formula for Var(X) is derived as follows: 


Var(X) = E[(X—p)"] 


> @- wpe) 


>, 2? = ux + w)p(x) 


= \'x*p(x) - 20)" x9) +0?) p@) 


x 


E[X?] — 2p? + p? 


E[X?| — p? 
That is, 


Var(X) = E[X?] — (E[X])” 


In words, the variance of X is equal to the expected value of X? minus the square of 
its expected value. In practice, this formula frequently offers the easiest way to 
compute Var(x). 


Example 5a 


Calculate Var(X) if X represents the outcome when a fair die is rolled. 
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Solution 


7 
It was shown in Example 3a ___ that E[X] = 3 Also, 


E|X?| 


lI | 
aa. oe 
a nN 
ae a 
a) eR 
Fi te 
Zz + 

ae 

No 

nN 

oO 
_ 

+ 

ww 

nN 
| as 
one a 
; ie 

+ 

dd 

nN 
, i 
a) eR 
ae 

+ 

uw 

nN 
> a 
Da) eR 
nae 

+ 

oO 

nN 
, 
ony a 
as 


Hence, 


91 /7\" 35 
Var(X) = — 2 ~ 42 


Because Var (X) = E[(X — »)*] = >: (x — 1)?P(X = x) is the sum of nonnegative 
x 


terms, it follows that Var (X) = 0 or equivalently, that 


E[X?] > (E[X)’ 


That is, the expected value of the square of a random variable is at least as large as 
the square of its expected value. 


Example 5b 


The friendship paradox is often expressed as saying that on average your friends 
have more friends than you do. More formally, suppose that there are n people in 
a certain population, labeled 1, 2, ...,n, and that certain pairs of these individuals 
are friends. This friendship network can be graphically represented by having a 
circle for each person and then having a line between circles to indicate that 
those people are friends. For instance, Figure 4.5 _ indicates that there are 4 
people in the community and that persons 1 and 2 are friends, persons 1 and 3 
are friends, persons 1 and 4 are friends, and persons 2 and 4 are friends. 


Figure 4.5 A friendship graph 
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(1) (2) 


(3) C4) 


n 
Let f(i) denote the number of friends of person i and let f = ». f@. (Thus, 
vl 


for the network of Figure 4.5, f(1) = 3, f(2) = 2, f(3) = 1, f(4) = 2 and 
f = 8.) Now, let X be a randomly chosen individual, equally likely to be any of 
1, 2,...,n. That is, 


P(X =i) =1/n,i=1,..,n 


Letting g(i) = fi) in Proposition 4.1 __, it follows that E|f(X)], the expected 
number of friends of X, is 


ELFCO] = by fOPK =) = > f@/n=f/n 


Also, letting g(i) = f7(i), it follows from Proposition 4.1 _ that E[f?(X)], the 
expected value of the square of the number of friends of X, is 


ELF?) = y POPH =) = y f/m 


i=1 i=1 
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Consequently, we see that 


(5.1) 


E[f2(x)| 2a PO 
ELF) 


Now suppose that each of the n individuals writes the names of all their friends, 
with each name written on a separate sheet of paper. Thus, an individual with k 
friends will use k separate sheets. Because person i has f(i) friends, there will 


n 
be f= > f(®) separate sheets of paper, with each sheet containing one of 
i=1 


the n names. Now choose one of these sheets at random and let Y denote the 
name on that sheet. Let us compute E|f(Y)|, the expected number of friends of 
the person whose name is on the chosen sheet. Now, because person i has f(i) 
friends, it follows that i is the name on f(i) of the sheets, and thus i is the name 


FO 


on the chosen sheet with probability rx That is, 


AO) 


PY=)=— +, i=1.,n 
7 


Consequently, 


(5.2) 


JOR == y PrO/F 


i=1 


E[f(Y)] = 


WM 


Thus, from (5.1 __), we see that 


ELF] S 


E[f(Y)] = ECO] ~ 


E[f(X)] 


where the inequality follows because the expected value of the square of any 
random variable is always at least as large as the square of its expectation. 
Thus, E[f(X)| < E[f(Y)], which says that the average number of friends that a 
randomly chosen individual has is less than (or equal to if all the individuals have 
the same number of friends) the average number of friends of a randomly chosen 
friend. 


Remark The intuitive reason for the friendship paradox is that X is equally likely 
to be any of the n individuals. On the other hand Y is chosen with a probability 
proportional to its number of friends; that is, the more friends an individual has 
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the more likely that individual will be Y. Thus, Y is biased towards individuals with 
a large number of friends and so it is not surprising that the average number of 
friends that Y has is larger than the average number of friends that X has. 


The following is a further example illustrating the usefulness of the inequality that the 
expected value of a square is at least as large as the square of the expected value. 
Example 5c 
Suppose there are m days in a year, and that each person is independently born 
m 
on day r with probability p,, 7 = 1,...,m, o p, = 1. Let A; ; be the event that 
r=1 
persons i and j are born on the same day. 
a. Find P(A; 3) 


b. Find P(A,3|A12) 
c. Show P(A; 3|A1,2) = P(A13) 


Solution 


a. Because the event that 1 and 3 have the same birthday is the union of the 
m mutually exclusive events that they were both born on day r,r = 1,...,m, 


P(Ais) = ) By. 
Tr 


we have that 


b. Using the definition of conditional probability we obtain that 
P(A1,241,3) 
P(Ay,2) 


> p*, 
. 
> D*, 
‘ 
where the preceding used that A; >A; is the union of the m mutually 


exclusive events that 1, 2,3 were all born on day r,r = 1,...,m. 
c. It follows from parts (a) and (b) that P(A, 3 | A12) > P(A;3) is equivalent to 


P(Ay3 | Ay,2) = 


2 
>. p= (> p?,) . To prove this inequality, let X be a random variable 
rT rT 


that is equal to p. with probability p.. That is, P(X = p,) =p,,r = 1,..,.m. 
Then 


ELX]= ) p,P&=p,)= > p?r E[X?]= ) p?P®=p,) =) Py 


and the result follows because E[X?| > (E[X])’. 


Remark The intuitive reason for why part (c) is true is that if the “popular days” 
are the ones whose probabilities are relatively large, then knowing that 1 and 2 
share the same birthday makes it more likely (than when we have no information) 
that the birthday of 1 is a popular day and that makes it more likely that 3 will 
have the same birthday as does 1. 


A useful identity is that for any constants a and b, 


Var(aX + b) = a?Var(X) 


To prove this equality, let 4 = E[X] and note from Corollary 4.1 __ that 
E|aX + b] = au + b. Therefore, 


Var(axX + b) E|(aXx +b-—ap- b)?| 


= E[a?(X — p)*| 

= a@E|(X-p)*| 

= a’Var(X) 
Remarks 


a. Analogous to the means being the center of gravity of a distribution of mass, 
the variance represents, in the terminology of mechanics, the moment of 
inertia. 

b. The square root of the Var(X) is called the standard deviation of X, and we 


denote it by SD(X). That is, 
SD(X) = J Var(X) 


Discrete random variables are often classified according to their probability mass 
functions. In the next few sections, we consider some of the more common types. 


4.6 The Bernoulli and Binomial Random 
Variables 


Suppose that a trial, or an experiment, whose outcome can be classified as either a 
success or a failure is performed. If we let X = 1 when the outcome is a success and 
X = 0 when it is a failure, then the probability mass function of X is given by 
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(6.1) 


P{X =0}=1-p 
PxSi}=% 


p(0) 
p(1) 


where p, 0 < p <1, is the probability that the trial is a success. 


A random variable X is said to be a Bernoulli random variable (after the Swiss 
mathematician James Bernoulli) if its probability mass function is given by 
Equations (6.1) for some p € (0,1). 


Suppose now that n independent trials, each of which results in a success with 
probability p or in a failure with probability 1 — p, are to be performed. If X represents 
the number of successes that occur in the n trials, then X is said to be a binomial 
random variable with parameters (n, p). Thus, a Bernoulli random variable is just a 
binomial random variable with parameters (1, p). 


The probability mass function of a binomial random variable having parameters (n, p) 
is given by 


(6.2) 


n i w=i . 
po = (7 )'a-») i=0,1,....n 


The validity of Equation (6.2) — may be verified by first noting that the probability of 
any particular sequence of n outcomes containing i successes and n — i failures is, 
by the assumed independence of trials, p‘(1 — p)” Equation (6.2) then follows, 
n 
since there are ( } different sequences of the n outcomes leading to i successes 
l 


n 
and n — i failures. This perhaps can most easily be seen by noting that there are ( ‘ 

i 
different choices of the i trials that result in successes. For instance, if n = 4,i = 2, 


4 
then there are ()) = 6 ways in which the four trials can result in two successes, 


namely, any of the outcomes (s, s, f, f), (s, f, s, f), (s, f, f, 8), (f, 5, 8, f), (f, s, fs), 
and (f, f, s, s), where the outcome (s, s, f, f) means, for instance, that the first two 
trials are successes and the last two failures. Since each of these outcomes has 
probability p2(1 — p)? of occurring, the desired probability of two successes in the 


ee ke: 2 
four trials is 5 p?(1-p)*. 


Note that, by the binomial theorem, the probabilities sum to 1; that is, 


>» p= > ("p'a-py"'= p+ a-wI"=1 


0 i=0 


Example 6a 

Five fair coins are flipped. If the outcomes are assumed independent, find the 
probability mass function of the number of heads obtained. 

Solution 

If we let X equal the number of heads (successes) that appear, then X is a 


1 
binomial random variable with parameters (n =5,p= 5) Hence, by Equation 


reo (ONG) 8 
ren = (OG) = 
nea = (VG) = 3 
rea = O)G) = 8 
reno = (ONG) = 8 
reas = OGG) = 8 


It is known that screws produced by a certain company will be defective with 
probability .01, independently of one another. The company sells the screws in 
packages of 10 and offers a money-back guarantee that at most 1 of the 10 
screws is defective. What proportion of packages sold must the company 
replace? 


Solution 


If X is the number of defective screws in a package, then X is a binomial random 
variable with parameters (10, .01). Hence, the probability that a package will 
have to be replaced is 
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1 — P{X = 0} — P{x = 1} 1 (fp )oonr*eony?? - (Ponca) 


= 004 


Thus, only .4 percent of the packages will have to be replaced. 


Example 6c 


The following gambling game, known as the wheel of fortune (or chuck-a-luck), is 
quite popular at many carnivals and gambling casinos: A player bets on one of 
the numbers 1 through 6. Three dice are then rolled, and if the number bet by the 
player appears i times, i = 1, 2,3, then the player wins i units; if the number bet 
by the player does not appear on any of the dice, then the player loses 1 unit. Is 
this game fair to the player? (Actually, the game is played by spinning a wheel 
that comes to rest on a slot labeled by three of the numbers 1 through 6, but this 
variant is mathematically equivalent to the dice version.) 


Solution 


If we assume that the dice are fair and act independently of one another, then the 
number of times that the number bet appears is a binomial random variable with 


1 
parameters (3, 3} Hence, letting X denote the player’s winnings in the game, we 


reno ONG 
reo (MYGY -& 
m9 OE = 8 
oo OG ok 


In order to determine whether or not this is a fair game for the player, let us 
calculate E[X]. From the preceding probabilities, we obtain 


= 254 7a 30 aS 


FAL = 216 


—17 
216 


Hence, in the long run, the player will lose 17 units per every 216 games he 
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plays. 


In the next example, we consider the simplest form of the theory of inheritance as 
developed by Gregor Mendel (1822—1884). 


Example 6d 


Suppose that a particular trait (such as eye color or left-handedness) of a person 
is classified on the basis of one pair of genes, and suppose also that d 
represents a dominant gene and r a recessive gene. Thus, a person with dd 
genes is purely dominant, one with rr is purely recessive, and one with rd is 
hybrid. The purely dominant and the hybrid individuals are alike in appearance. 
Children receive 1 gene from each parent. If, with respect to a particular trait, 2 
hybrid parents have a total of 4 children, what is the probability that 3 of the 4 
children have the outward appearance of the dominant gene? 


The preceding Figure 4.6a andb — shows what can happen when hybrid 
yellow (dominant) and green (recessive) seeds are crossed. 


Figure 4.6 (a) Crossing pure yellow seeds with pure green seeds; (b) 
Crossing hybrid first-generation seeds. 
Pure yellow Pure green 


Yellow hybrid 


(a) 
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Hybrid Hybrid 


Pure yellow Hybrid Hybrid Pure green 


(b) 


Solution 


If we assume that each child is equally likely to inherit either of 2 genes from 
each parent, the probabilities that the child of 2 hybrid parents will have dd, rr, 


11 1 
and rd pairs of genes are, respectively, ri and > Hence, since an offspring will 


have the outward appearance of the dominant gene if its gene pair is either dd or 
rd, it follows that the number of such children is binomially distributed with 


3 
parameters (+ ) Thus, the desired probability is 


4\(3\°(1\" _ 27 
3/\4) \4] 64 
Example 6e 


Consider a jury trial in which it takes 8 of the 12 jurors to convict the defendant; 
that is, in order for the defendant to be convicted, at least 8 of the jurors must 
vote him guilty. If we assume that jurors act independently and that whether or 
not the defendant is guilty, each makes the right decision with probability 6, what 
is the probability that the jury renders a correct decision? 


Solution 


The problem, as stated, is incapable of solution, for there is not yet enough 
information. For instance, if the defendant is innocent, the probability of the jury 
rendering a correct decision is 
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12 12 
> ( ae _ gy i2-% 
f=s 


whereas, if he is guilty, the probability of a correct decision is 


12 


3 (aia — gy i2-% 


i=8 


Therefore, if a represents the probability that the defendant is guilty, then, by 
conditioning on whether or not he is guilty, we obtain the probability that the jury 
renders a correct decision: 


12 12 


a ». (ta - a4 a=a) » ("; jet -o)1*— 


i=8 i=5 


Example 6f 


A communication system consists of n components, each of which will, 
independently, function with probability p. The total system will be able to operate 
effectively if at least one-half of its components function. 


a. For what values of p is a 5-component system more likely to operate 
effectively than a 3-component system? 

b. In general, when is a (2k + 1)-component system better than a (2k — 1) 
-component system? 


Solution 


a. Because the number of functioning components is a binomial random 
variable with parameters (n, p), it follows that the probability that a 
5-component system will be effective is 


5 5 
ae —p)’ + ( “ea —p)+p° 
whereas the corresponding probability for a 3-component system is 


(3)e2a —p)+p? 


Hence, the 5-component system is better if 
10p3(1—p)* + 5p*(1—p) + p> > 3p?(1 —p) + p® 


which reduces to 
3(p —1)?(2p—1) > 0 


or 
> = 
iO) 
b. In general, a system with 2k + 1 components will be better than one with 
1 
2k — 1 components if (and only if) p > > To prove this, consider a system 


of 2k + 1 components and let X denote the number of the first 2k — 1 that 
function. Then 
Poy+i(effective) = P{X>k+1}+P{X =k}-(1-p)”%) 


+P{X = k —1}p? 


which follows because the (2k + 1)-component system will be effective if 
either 
i X>k+1; 
ii. X = k and at least one of the remaining 2 components function; or 
iii. X = k — 1 and both of the next 2 components function. 


Since 
P{X = k} 


P{X =k} +P{X>k+ 


P24 (effective) 


we obtain 
P2744 (effective) — Pz, (effective) 


= P{X =k — 1}p?- (1—p)?P{X =k} 


_(2k-1) ee ee ee ae keq —_ »\k-1 
“(4 =p) p= G n*( 7 Jota 2) 

_ we "pKa sia aay since | _ e ‘) 
>0ep>s 


4.6.1 Properties of Binomial Random Variables 


We will now examine the properties of a binomial random variable with parameters n 
and p. To begin, let us compute its expected value and variance. To begin, note that 
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EX*)|..= ) 


“(pia —p)"' 


i 


[4a 


“("\p'a —p)"' 


i 


Using the identity 


gives 


n 
_,{n—1)\ ._ =u 
E[X"] = np > y ae 11 —p)" 
i=1 
— : eat L\ 3 n-1-; byletting 
=in9: DO = pipe 
j =o 


= npE[(¥+1)*"*] 
where Y is a binomial random variable with parameters n — 1, p. Setting k = 1 in the 
preceding equation yields 
E[X] = np 
That is, the expected number of successes that occur in n independent trials when 
each is a success with probability p is equal to np. Setting k = 2 in the preceding 


equation and using the preceding formula for the expected value of a binomial 
random variable yields 


E[X?] = npE[Y +1] 
= np|(n-1)p +1] 
Since E[X] = np, we obtain 


Var(X) = E[X*] — (E[X])’ 
= np[(n- 1)p + 1] - (np) 
= np(1—p) 


2 


Summing up, we have shown the following: 
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If X is a binomial random variable with parameters n and p, then 


E[X] np 
Var(X) = np(1—p) 


The following proposition details how the binomial probability mass function first 
increases and then decreases. 


Proposition 6.1 


If X is a binomial random variable with parameters (n, p), where 0 < p < 1, then 
as k goes from 0 to n, P{X = k} first increases monotonically and then decreases 
monotonically, reaching its largest value when k is the largest integer less than or 
equal to (n + 1)p. 


Proof We prove the proposition by considering P{X = k}/P{X = k — 1} and 
determining for what values of k it is greater or less than 1. Now, 


n! k n-k 
PIX=k}o (a—bmin? 1 ~ PI) 
P{X=k-1} n! _ n= 
“aokepiaepie a 
_ G@=—k+1)p 
kK(1 =p) 


Hence, P{X = k} > P{X = k — 1} if and only if 
(n-k+ Dp 2 kip) 
or, equivalently, if and only if 
k<(n+1)p 
and the proposition is proved. 


As an illustration of Proposition 6.1, consider Figure 4.7 _, the graph of the 


1 
probability mass function of a binomial random variable with parameters (10 7) 


10\/1\*° 
Figure 4.7 Graph of p(k) = x Na) ° 
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Example 6g 


In a U.S. presidential election, the candidate who gains the maximum number of 
votes in a state is awarded the total number of electoral college votes allocated 
to that state. The number of electoral college votes of a given state is roughly 
proportional to the population of that state—that is, a state with population n has 
roughly nc electoral votes. (Actually, it is closer to nc + 2, as a state is given an 
electoral vote for each member it has in the House of Representatives, with the 
number of such representatives being roughly proportional to the population of 
the state, and one electoral college vote for each of its two senators.) Let us 
determine the average power of a citizen in a state of size n in a close 
presidential election, where, by average power in a close election, we mean that 
a voter in a state of size n = 2k + 1 will be decisive if the other n — 1 voters split 
their votes evenly between the two candidates. (We are assuming here that n is 
odd, but the case where n is even is quite similar.) 


Because the election is close, we shall suppose that each of the other n — 1 = 2k 
voters acts independently and is equally likely to vote for either candidate. 
Hence, the probability that a voter in a state of size n = 2k + 1 will make a 
difference to the outcome is the same as the probability that 2k tosses of a fair 
coin land heads and tails an equal number of times. That is, 
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P{ voter in state of size 2k + 1 makes a difference } 


(BV) 


~ lk 22* 


To approximate the preceding equality, we make use of Stirling’s approximation, 
which says that for k large, 


kl ~ ket t/2e-kV an 


where we say that a; ~ b, when the ratio a, /b, approaches 1 as k approaches 
co. Hence, it follows that 


P{voter in state of size 2k + 1 makes a difference} 


Qe etn. ot 
k2kt19-2k(2q)27* View 


Because such a voter (if he or she makes a difference) will affect nc electoral 
votes, the expected number of electoral votes a voter in a state of size n will 
affect—or the voter’s average power—is given by 


average power = ncP{makesa difference} 


(nt /2 
c,/2n/ 1 


Thus, the average power of a voter in a state of size n is proportional to the 
square root of n, showing that in presidential elections, voters in large states 
have more power than do those in smaller states. 


4.6.2 Computing the binomial distribution function 


Suppose that X is binomial with parameters (n, p). The key to computing its 
distribution function 


P{X <i}= > ({pta-»" i=0,1,..,n 


kK =0 


is to utilize the following relationship between P{X = k + 1} and P{X = k}, which was 
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established in the proof of Proposition 6.1 


(6.3) 


p n-k 
PIX=k+Yrr>— es 


P{X=k 
—pk+1 { j 


Example 6h 


Let X be a binomial random variable with parameters n = 6, p = .4. Then, 
starting with P{X = 0} = (.6)° and recursively employing Equation (6.3) ,we 
obtain 


P{X =0} = (.6)° = .0467 


res ik = +o PIX =o} ~ 1866 
P{X=2} = a? P(x 1) ~ 3110 
P(X =3} = +5 PIX 2} ~ 2765 
P(x =4} = 5 PIX 3} ~ 1382 
P{x =5} = 5a PIX =4) = .0369 
P(X =6} = == P(X =s} ~ .0041 


A computer program that utilizes the recursion (6.3) to compute the binomial 
distribution function is easily written. To compute P{X < i}, the program should first 
compute P{X = i} and then use the recursion to successively compute 

P{X =i-—1},P{X =i- 2}, and soon. 


Historical note 


Independent trials having a common probability of success p were first studied 
by the Swiss mathematician Jacques Bernoulli (1654—1705). In his book Ars 
Conjectandi (The Art of Conjecturing), published by his nephew Nicholas eight 
years after his death in 1713, Bernoulli showed that if the number of such trials 
were large, then the proportion of them that were successes would be close to 
p with a probability near 1. 


Jacques Bernoulli was from the first generation of the most famous 
mathematical family of all time. Altogether, there were between 8 and 12 
Bernoullis, spread over three generations, who made fundamental contributions 
to probability, statistics, and mathematics. One difficulty in knowing their exact 
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number is the fact that several had the same name. (For example, two of the 
sons of Jacques’s brother Jean were named Jacques and Jean.) Another 
difficulty is that several of the Bernoullis were known by different names in 
different places. Our Jacques (sometimes written Jaques) was, for instance, 
also known as Jakob (sometimes written Jacob) and as James Bernoulli. But 
whatever their number, their influence and output were prodigious. Like the 
Bachs of music, the Bernoullis of mathematics were a family for the ages! 
Example 6i 


If X is a binomial random variable with parameters n = 100 and p = .75, find 
P{X = 70} and P{X < 70}. 


Solution 


A binomial calculator can be used to obtain the following solutions: 


Figure 4.8 


Binomial Distribution 


Enter Value For p/.75 


Enter Value F 


Enter Value Fo 


Probability (Number of Successes -04575381 


Probability (Number of Successes < = i) = .14954105 


4.7 The Poisson Random Variable 


A random variable X that takes on one of the values 0, 1, 2,... is said to be a Poisson 
random variable with parameter A if, for some A > 0, 


(7.1) 


A 
p(i) = P{xX =i}= ar i=0,1,2,.. 


Equation (7.1) | defines a probability mass function, since 
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(ee) (ee) ni 
». p(i) =e"4 —=e 4e4=1 
i=0 


i=0 


The Poisson probability distribution was introduced by Siméon Denis Poisson in a 
book he wrote regarding the application of probability theory to lawsuits, criminal 
trials, and the like. This book, published in 1837, was entitled Recherches sur la 
probabilité des jugements en matiére criminelle et en matiére civile (Investigations 
into the Probability of Verdicts in Criminal and Civil Matters). 


The Poisson random variable has a tremendous range of applications in diverse 
areas because it may be used as an approximation for a binomial random variable 
with parameters (n, p) when n is large and p is small enough so that np is of 
moderate size. To see this, suppose that X is a binomial random variable with 
parameters (n, p), and let A = np. Then 


Parad = Gpapla-p 
_ | - n-i 
7 aon Ww * ~ 
_ n(n—1)--(m—it+1) Ai —-A/n)" 
7 n' i! (1—a/n)' 


Now, for n large and A moderate, 


(1-3) ~e 4 RS ee og (1-2) =1 


n ni 


Hence, for n large and A moderate, 


i 


A 
—~Awxwo-A 
P{X =i} xe 7 


In other words, if n independent trials, each of which results in a success with 
probability p, are performed, then when n is large and p is small enough to make np 
moderate, the number of successes occurring is approximately a Poisson random 
variable with parameter 4 = np. This value A (which will later be shown to equal the 
expected number of successes) will usually be determined empirically. 


Some examples of random variables that generally obey the Poisson probability law 
[that is, they obey Equation (7.1) _] are as follows: 


1. The number of misprints on a page (or a group of pages) of a book 
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. The number of people in a community who survive to age 100 

. The number of wrong telephone numbers that are dialed in a day 

. The number of packages of dog biscuits sold in a particular store each day 

. The number of customers entering a post office on a given day 

. The number of vacancies occurring during a year in the federal judicial system 


NOOO BW PD 


. The number of a-particles discharged in a fixed period of time from some 
radioactive material 


Each of the preceding and numerous other random variables are approximately 
Poisson for the same reason—namely, because of the Poisson approximation to the 
binomial. For instance, we can suppose that there is a small probability p that each 
letter typed on a page will be misprinted. Hence, the number of misprints on a page 
will be approximately Poisson with 2 = np, where n is the number of letters on a 
page. Similarly, we can suppose that each person in a community has some small 
probability of reaching age 100. Also, each person entering a store may be thought 
of as having some small probability of buying a package of dog biscuits, and so on. 


Example 7a 


Suppose that the number of typographical errors on a single page of this book 
1 
has a Poisson distribution with parameter A = * Calculate the probability that 


there is at least one error on this page. 


Solution 


Letting X denote the number of errors on this page, we have 


P{X = 1}=1-P{xX =0}=1-e71/? = .393 


Example 7b 


Suppose that the probability that an item produced by a certain machine will be 
defective is .1. Find the probability that a sample of 10 items will contain at most 
1 defective item. 


Solution 

The desired probability is (Deo + (7) 1)*(.9)? = .7361, whereas 
the Poisson approximation yields the value e~1+e 1 ~ .7358. 

Example 7c 


Consider an experiment that consists of counting the number of a@ particles given 


off in a 1-second interval by 1 gram of radioactive material. If we know from past 
experience that on the average, 3.2 such a particles are given off, what is a good 
approximation to the probability that no more than 2 a particles will appear? 


Solution 


If we think of the gram of radioactive material as consisting of a large number n 
of atoms, each of which has probability of 3.2 /n of disintegrating and sending off 
an a particle during the second considered, then we see that to a very close 
approximation, the number of a particles given off will be a Poisson random 
variable with parameter 2 = 3.2. Hence, the desired probability is 


(3.2)" 2)* 


e 32 2e 3.2 
= 


P{X < 2} 


3799 


R 


Before computing the expected value and variance of the Poisson random variable 
with parameter A, recall that this random variable approximates a binomial random 
variable with parameters n and p when nis large, p is small, and A = np. Since such 
a binomial random variable has expected value np = A and variance 

np(1—p) =A(1— p) = A (since p is small), it would seem that both the expected 
value and the variance of a Poisson random variable would equal its parameter A. 
We now verify this result: 


Il 
SS 
cae 
5 
a 
(qo) 

I MJ 


Thus, the expected value of a Poisson random variable xX is indeed equal to its 
parameter 4. To determine its variance, we first compute E[X7|: 
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where the final equality follows because the first sum is the expected value of a 


ll 

No 
hp 
= 


Poisson random variable with parameter A and the second is the sum of the 
probabilities of this random variable. Therefore, since we have shown that E[X] = 4, 
we obtain 


Var(X) 


E[X?] — (E[X)” 
= A 


Hence, the expected value and variance of a Poisson random variable are both 
equal to its parameter A. 


We have shown that the Poisson distribution with parameter np is a very good 
approximation to the distribution of the number of successes in n independent trials 
when each trial has probability p of being a success, provided that n is large and p 
small. In fact, it remains a good approximation even when the trials are not 
independent, provided that their dependence is weak. For instance, recall the 
matching problem (Example 5m ~~ of Chapter 2 _) in which n men randomly 
select hats from a set consisting of one hat from each person. From the point of view 
of the number of men who select their own hat, we may regard the random selection 
as the result of n trials where we say that trial i is a success if person i selects his 
own hat, i = 1,...,n. Defining the events £;,i = 1,...,n, by 


E, = {trialiis asuccess } 
it is easy to see that 


1 1 a 
P{E;} = and P{E; | E;} = A= J Fl 


Thus, we see that although the events £;, i = 1,...,n are not independent, their 
dependence, for large n, appears to be weak. Because of this, it seems reasonable 


to expect that the number of successes will approximately have a Poisson 
distribution with parameter n x 1/n = 1 and indeed this is verified in Example 5m 
of Chapter 2 


For a second illustration of the strength of the Poisson approximation when the trials 
are weakly dependent, let us consider again the birthday problem presented in 
Example 5i of Chapter2_. In this example, we suppose that each of n people 
is equally likely to have any of the 365 days of the year as his or her birthday, and 
the problem is to determine the probability that a set of n independent people all 
have different birthdays. A combinatorial argument was used to determine this 


1 
probability, which was shown to be less than 5 when n = 23. 


We can approximate the preceding probability by using the Poisson approximation 
as follows: Imagine that we have a trial for each of the (7) pairs of individuals i and 
j,i # j, and say that trial i, 7 is a success if persons i and j have the same birthday. If 
we let E;; denote the event that trial i, j is a success, then, whereas the (7) events 


Eij,,1<i< j <n, are not independent (see Theoretical Exercise 4.21 __), their 
dependence appears to be rather weak. (Indeed, these events are even pairwise 
independent, in that any 2 of the events E;; and E,; are independent—again, see 
Theoretical Exercise 4.21 _). Since P(E,;) = 1/365, it is reasonable to suppose 
that the number of successes should approximately have a Poisson distribution with 


mean (?)/365 = n(n — 1)/730. Therefore, 


P{no 2 people have the same birthday} = P{0 successes} 
—n(n — 1) 
| 730 


1 
To determine the smallest integer n for which this probability is less than 3 note that 


—n(n — 1) 2 1 
“rE gag" (9 


R 


is equivalent to 


oo | 2 2 


Taking logarithms of both sides, we obtain 
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n(n—1) = 730log2 
505.997 


R 


which yields the solution n = 23, in agreement with the result of Example 5i _— of 
Chapter 2 


Suppose now that we wanted the probability that among the n people, no 3 of them 
have their birthday on the same day. Whereas this now becomes a difficult 
combinatorial problem, it is a simple matter to obtain a good approximation. To begin, 


n 
imagine that we have a trial for each of the (3) triplets i, 7, k, where 


1<i<j<k<n, and call the i, j, k trial a success if persons i, 7, and k all have 
their birthday on the same day. As before, we can then conclude that the number of 
successes is approximately a Poisson random variable with parameter 


” \Pfi, j, k have th bithdayk = 1 
3 LJ, ave the same birthday; = 3 365 


n(n—1)(n—- 2) 
6 x (365)? 


Hence, 


; n(n — 1)(n— 2) 
P{ no 3 have the same birthday } » exp) ——~—_-__—_—_ 


799350 


1 
This probability will be less than 5 when n is such that 
n(n — 1)(n— 2) = 799350 log 2 = 554067.1 


which is equivalent to n > 84. Thus, the approximate probability that at least 3 


1 
people in a group of size 84 or larger will have the same birthday exceeds 5 


For the number of events to occur to approximately have a Poisson distribution, it is 
not essential that all the events have the same probability of occurrence, but only 
that all of these probabilities be small. The following is referred to as the Poisson 
paradigm. 


Poisson paradigm 

Consider n events, with p, equal to the probability that event i occurs, 

i= 1,...,n. If all the p, are “small” and the trials are either independent or at 
most “weakly dependent,” then the number of these events that occur 
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n 


approximately has a Poisson distribution with mean » De 
i=1 


Our next example not only makes use of the Poisson paradigm, but also illustrates a 
variety of the techniques we have studied so far. 


Example 7d Length of the longest run 


A coin is flipped n times. Assuming that the flips are independent, with each one 
coming up heads with probability p, what is the probability that there is a string of 
k consecutive heads? 


Solution 


We will first use the Poisson paradigm to approximate this probability. Now, if for 
i=1,..,n—k+1, we let H; denote the event that flips i,i+1,...,i+k—41 all land 
on heads, then the desired probability is that at least one of the events H; occur. 
Because H; is the event that starting with flip i, the next k flips all land on heads, 
it follows that P(H;) = p*. Thus, when p* is small, we might think that the number 
of the H; that occur should have an approximate Poisson distribution. However, 
such is not the case, because, although the events all have small probabilities, 
some of their dependencies are too great for the Poisson distribution to be a 
good approximation. For instance, because the conditional probability that flips 
2,...,k +1 are all heads given that flips 1, ...,k are all heads is equal to the 
probability that flip k + 1 is a head, it follows that 


P(H2|H1) =p 


which is far greater than the unconditional probability of H. 


The trick that enables us to use a Poisson approximation is to note that there will 
be a string of k consecutive heads either if there is such a string that is 
immediately followed by a tail or if the final k flips all land on heads. 
Consequently, for i = 1,...,n — k, let E; be the event that flips i, ...,i + k — 1 are all 
heads and flip i+ k is a tail; also, let E,,_,44 be the event that flips n —k + 1,...,n 
are all heads. Note that 


P(E;) p*(1—p),i<sn-k 
P(En-x+1) = p 


Thus, when p* is small, each of the events E; has a small probability of 
occurring. Moreover, for i # j, if the events E; and E; refer to nonoverlapping 
sequences of flips, then P(E;|E;) = P(E;); if they refer to overlapping sequences, 
then P(E; |E;) = 0. Hence, in both cases, the conditional probabilities are close to 
the unconditional ones, indicating that NV, the number of the events E; that occur, 


should have an approximate Poisson distribution with mean 


n—-k+1 
EIN|= PE) = (n= Wp EA - p) + 


i=1 


Because there will not be a run of k heads if (and only if) N = 0, the preceding 
gives 


P(no head strings of length k ) = P(N = 0) © exp{ — (n— k)p*(1 — p) — p*} 


If we let L,, denote the largest number of consecutive heads in the n flips, then, 
because L,, will be less than k if (and only if) there are no head strings of length k 
, the preceding equation can be written as 


P{Ly < k} © exp{ — (n— k)p*(1 — p) — p*} 


Now, let us suppose that the coin being flipped is fair; that is, suppose that 
p = 1/2. Then the preceding gives 


n-k+2 n 
P{L, < k} = exp = ~ okt1 ~ exp| — se} 


Anz k= 2 
where the final approximation supposes that e2**? ~ 1 (that is, that pert = 0). 
Let j = log, n, and assume that j is an integer. For k = j + i, 


n n 1 


gkt1 ~~ ajpitl 5itt 


Consequently, 
P{L, <j+i ~ exp{— (1/2)'74) 
which implies that 


P{L,=jti} = P{L,<j+it+1}-P{L, <j+i 


exp{—(1/2)!*7} — exp{—(1/2)'*4} 


R 


For instance, 
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P{L, <j—3} ~ e+ .0183 

P{L, =j—3} ~ e?-e-** 1170 

P{L, =j—2} ~ e 1-e-2 x 2325 

P{L,=j—1} ~ e 1/2-e71 & .2387 
Pik = jh = @-4t ae tee 1723 

P{L, =jt+1} ~ e 1/8—e-1/4 x 1037 

P{L, =j+2} ~ e 1/16_e-1/8 ~ 0569 

P{ln =j+3} ~ e7 1/32 e71/16 x 0298 

1 — e 1/32 = 0308 


R 


P{L, => j+4} 


Thus, we observe the rather interesting fact that no matter how large n is, the 
length of the longest run of heads in a sequence of n flips of a fair coin will be 
within 2 of log, (n) — 1 with a probability approximately equal to . 86. 


We now derive an exact expression for the probability that there is a string of k 
consecutive heads when a coin that lands on heads with probability p is flipped n 
times. With the events E;,i = 1, ....n — k +1, as defined earlier, and with L,, 
denoting, as before, the length of the longest run of heads, 


P(Ly => k) = P( there is astring of k consecutive heads ) = P( ve E;) 


The inclusion—exclusion identity for the probability of a union can be written as 


n—-k +1 


pun B= Do (=D YS Pay, 


Tr=1 iy <n < ty 


Let S; denote the set of flip numbers to which the event E; refers. (So, for 
instance, S, = {1,...,k + 1}.) Now, consider one of the r-way intersection 
probabilities that does not include the event E,,_;4,. That is, consider 
P(E,,:-E;,.) where iy < +++ <i, <n—k-+ 1. On the one hand, if there is any 
then this probability is 0. On the other hand, if there 
is no overlap, then the events E 


overlap in the sets S;,, ..., Si, 


i,» Ej, are independent. Therefore, 


0, if there is any overlap in S;, , ..., S; 


ly? Tr 


ea ae r 
p’*(1—p)', _ ifthereisno overlap 


We must now determine the number of different choices of 
iy <+* <i, <n—k +1 for which there is no overlap in the sets S;,, ..., 5;,. To do 


so, note first that each of the Sij: j =1,..,7, refer to k + 1 flips, so, without any 
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overlap, they together refer to r(k + 1) flips. Now consider any permutation of r 
identical letters a and of n — r(k + 1) identical letters b. Interpret the number of b 
‘s before the first a as the number of flips before S;, , the number of b's between 


the first and second a as the number of flips between S;, and S;,, and so on, with 
the number of b's after the final a representing the number of flips after 5;_. 


n 
Because there are ( permutations of r letters a and of n — r(k + 1) letters 


b, with every such permutation corresponding (in a one-to-one fashion) to a 
different nonoverlapping choice, it follows that 


R=TK\ 3 - 
> PEE) =( ; )p (-p) 


ip<u <ip<n-k +1 


We must now consider r-way intersection probabilities of the form 


P(E, Ei, En—k+1)) 


where i, <...<i,-1 <n—k-+1. Now, this probability will equal 0 if there is any 


overlap in S;_,....S Sn—x; if there is no overlap, then the events of the 


iy? ip—1? 


intersection will be independent, so 
P(Ei, Ei, En-e+1) = [p*(1 — p)|”*p* = pa — py * 


By a similar argument as before, the number of nonoverlapping sets S;,,..., Si, |; 


Sy—, Will equal the number of permutations of r — 1 letters a (one for each of the 
Sip) and of n— (r—1)(k +1) —k =n-—rk— (r—1) letters b (one 


for each of the trials that are not part of any of 5;,,...,5 


sets S;,,.. 


Sn—k+1). Since there 


ip—a? 


n—rk 
are ( * permutations of r — 1 letters a and of n — rk — (r — 1) letters b, we 


have 


P(E; -“E;,_,E Se ee eat 
( i," ip—1 n-k+1) = r—-1 Pp (1 - p) 


by Sin SS tye SN Ske 1 


Putting it all together yields the exact expression, namely, 


n—-k+1 


n—rk 1/n-rk 
Pln=k)= (-9""( Jae Jjpva-p" 
he T P\r-1 
m= 
where we utilize the convention that ( ifm < j. 
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From a computational point of view, a more efficient method for computing the 
desired probability than the use of the preceding identity is to derive a set of 
recursive equations. To do so, let A,, be the event that there is a string of k 
consecutive heads in a sequence of n flips, and let P,, = P(A,,). We will derive a 
set of recursive equations for P,, by conditioning on when the first tail appears. 
For j = 1,...,k, let F; be the event that the first tail appears on flip 7, and let H be 
the event that the first k flips are all heads. Because the events Fj, ..., F,,H are 
mutually exclusive and exhaustive (that is, exactly one of these events must 
occur), we have 


k 
P(An) = >. P(An |F/)PCF;) + P(An LHDPCH) 
aa 


Now, given that the first tail appears on flip 7, where j < k, it follows that those j 
flips are wasted as far as obtaining a string of k heads in a row; thus, the 
conditional probability of this event is the probability that such a string will occur 
among the remaining n — j flips. Therefore, 


P(An|F;) = Pn-j 


Because P(A,,|H) = 1, the preceding equation gives 


Pn P(An) 


k 


> Py—; P(F;) + P(A) 


j= 


k 
> P,-j;p) *A—p)+p* 


j= 


Starting with P; = 0, j < k, and P; = p*, we can use the latter formula to 
recursively compute P;,+1,P%42, and so on, up to P,,. For instance, suppose we 
want the probability that there is a run of 2 consecutive heads when a fair coin is 
flipped 4 times. Then, with k = 2, we have P, = 0, P, = (1/2)’. Because, when 
p = 1/2, the recursion becomes 


kK 
Pa= ) Pn-j(1/2)! + (1/2)" 
p=1 


we obtain 


Ps = P2(1/2) + Py(1/2)? + (1/2)* = 3/8 
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and 


Py = P3(1/2) + P2(1/2)? + (1/2)* = 1/2 


which is clearly true because there are 8 outcomes that result in a string of 2 
consecutive heads: hhhh, hhht, hhth, hthh, thhh, hhtt, thht, and tthh. Each of 
these outcomes occurs with probability 1/16. 


Another use of the Poisson probability distribution arises in situations where “events’ 
occur at certain points in time. One example is to designate the occurrence of an 
earthquake as an event; another possibility would be for events to correspond to 
people entering a particular establishment (bank, post office, gas station, and so on); 
and a third possibility is for an event to occur whenever a war starts. Let us suppose 
that events are indeed occurring at certain (random) points of time, and let us 
assume that for some positive constant A, the following assumptions hold true: 


1. The probability that exactly 1 event occurs in a given interval of length h is 
equal to Ah + o(h), where o(h) stands for any function f(h) for which 
lim f(h)/h = 0. [For instance, f(h) = h” is o(h), whereas f(h) = his not.] 
h—>0 


2. The probability that 2 or more events occur in an interval of length h is equal 
to o(h). 

3. For any integers n, j,, j,,--» j,, and any set of n nonoverlapping intervals, if 
we define E; to be the event that exactly j, of the events under consideration 
occur in the ith of these intervals, then events F,, F2, ...,E,, are independent. 


Loosely put, assumptions 1 and 2 state that for small values of h, the probability that 
exactly 1 event occurs in an interval of size h equals Ah plus something that is small 
compared with h, whereas the probability that 2 or more events occur is small 
compared with h. Assumption 3 states that whatever occurs in one interval has no 
(probability) effect on what will occur in other, nonoverlapping intervals. 


We now show that under assumptions 1, 2, and 3, the number of events occurring in 
any interval of length t is a Poisson random variable with parameter At. To be 
precise, let us call the interval [0, t] and denote the number of events occurring in 
that interval by N(t). To obtain an expression for P{N(t) = k}, we start by breaking 
the interval [0, t] into n nonoverlapping subintervals, each of length /n (Figure 

49 ). 


Figure 4.9 
= a oe 


Now, 


(7.2) 
P{N(t) =k} = P{kofthensubintervals contain exactly 1 event and the other n — k contain 0 eve 


+P{N(t) = k andatleast 1 subinterval contains 2 or more events } 


The preceding equation holds because the event on the left side of Equation 

(7.2) , thatis, {N(t) = k}, is clearly equal to the union of the two mutually exclusive 
events on the right side of the equation. Letting A and B denote the two mutually 
exclusive events on the right side of Equation (7.2) ,wehave 


P(B) < P{atleast one subinterval contains 2 or more events } 


1 


II 
y 
a 
lca 


{ith subinterval contains 2 or more events } 


; by Boole’s 
P{ith subinterval contains 2 or more events } | 


inequality 


IA 
[42 


t 
o(—) by assumption 2 


= 


[ 


t/n 


[4a 


Now, for any t,t/n >0asn— o,so0o0(t/n)/(t/n) ~0asn- o, by the definition 
of o(h). Hence, 


(7.3) 
P(B) > 0asn—- o 
Moreover, since assumptions 1 and 2 imply thatT 


P{0 events occur in an interval of length h} 


= 1—[Ah + o(h) + o(h)] = 1—Ah— o(h) 


TThe sum of two functions, both of which are o(h), is also o(h). This is 
so because if lim, _,)9f(h)/h = limp, ..og(h)/h = 0, then 
limysoLf(h) + g(A)|/h = 0. 
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we see from the independence assumption (number 3) that 


P(A) P{k of the subintervals contain exactly 1 event and the other 


n— k contain 0 events } 


(")|+0(8)] fp-(4) eS)” 


However, since 


n a + o(=) = Att aoa > Atasn—- oo 


it follows, by the same argument that verified the Poisson approximation to the 
binomial, that 


(7.4) 
a" 


P(A) oe asn— oo 


Thus, from Equations (7.2) ,(7.3) ,and(7.4) _ , by lettingn— o, we obtain 
(7.5) 


P{N(t) =k} =e 


At)* 
na k=0,1.... 


Hence, if assumptions 1, 2, and 3 are satisfied, then the number of events occurring 
in any fixed interval of length t is a Poisson random variable with mean At, and we 
say that the events occur in accordance with a Poisson process having rate 2. The 
value A, which can be shown to equal the rate per unit time at which events occur, is 
a constant that must be empirically determined. 


The preceding discussion explains why a Poisson random variable is usually a good 
approximation for such diverse phenomena as the following: 


1. The number of earthquakes occurring during some fixed time span 

2. The number of wars per year 

3. The number of electrons emitted from a heated cathode during a fixed time 
period 

4. The number of deaths, in a given period of time, of the policyholders of a life 
insurance company 


Example 7e 


Suppose that earthquakes occur in the western portion of the United States in 
accordance with assumptions 1, 2, and 3, with 2 = 2 and with 1 week as the unit 
of time. (That is, earthquakes occur in accordance with the three assumptions at 
a rate of 2 per week.) 


a. Find the probability that at least 3 earthquakes occur during the next 2 
weeks. 

b. Find the probability distribution of the time, starting from now, until the next 
earthquake. 


Solution 


a. From Equation (7.5) ,we have 
P{N(2) > 3} = 1-—P{N(2) = 0} — P{N(2) = 1} — P{N(2) = 2} 
2 


1-— ~“4_ 4 —-4_ ap 4 
e e 2 © 


1-—13e "4 


b. Let X denote the amount of time (in weeks) until the next earthquake. 
Because X will be greater than t if and only if no events occur within the 
next t units of time, we have, from Equation (7.5) _, 

P{X >t} = P{N(@) = 0} =e"* 


so the probability distribution function F of the random variable X is given 


by 
1-—e 4 


{= e 2e 


F(t) = P{x <t}=1-P{x >t} 


4.7.1 Computing the Poisson Distribution Function 
If X is Poisson with parameter A, then 


(7.6) 
P{AX=it+1} e4altt/G4+1)! oa 
PX=i} ean +1 


Starting with P{X = 0} = e~4, we can use (7.6 _) to compute successively 
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P{X = 1} = AP{X = 0} 
P{X = 2} = * Pix = 1} 

a 
PiX=iti} = —7Pix=i 


We can use a module to compute the Poisson probabilities for Equation (7.6) 


Example 7f 

a. Determine P{X < 90} when X is Poisson with mean 100. 

b. Determine P{Y < 1075} when Y is Poisson with mean 1000. 
Solution 


Using the Poisson calculator of StatCrunch yields the solutions: 


a. P{X < 90} = .17138 
b. P{Y < 1075} = .99095 


4.8 Other Discrete Probability Distributions 


4.8.1 The Geometric Random Variable 


Suppose that independent trials, each having a probability p,0 < p < 1, of beinga 
success, are performed until a success occurs. If we let X¥ equal the number of trials 
required, then 


(8.1) 
P{X=n}=(1-p)" *p n=1,2,... 


Equation (8.1) —_ follows because, in order for X to equal n, it is necessary and 
sufficient that the first n — 1 trials are failures and the nth trial is a success. Equation 
(8.1) then follows, since the outcomes of the successive trials are assumed to be 
independent. 


Since 
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it follows that with probability 1, a success will eventually occur. Any random variable 
X whose probability mass function is given by Equation (8.1) is said to bea 
geometric random variable with parameter p. 


Example 8a 


An urn contains N white and M black balls. Balls are randomly selected, one at a 
time, until a black one is obtained. If we assume that each ball selected is 
replaced before the next one is drawn, what is the probability that 


a. exactly n draws are needed? 
b. at least k draws are needed? 


Solution 


If we let X denote the number of draws needed to select a black ball, then X 
satisfies Equation (8.1) withp =M/(M+N). Hence, 


ee ie 
7 (*==\an) M+N (+N)? 
nT 


M (ee) 
Pix 2 k} M+N » 
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Of course, part (b) could have been obtained directly, since the probability that at 
least k trials are necessary to obtain a success is equal to the probability that the 
first k — 1 trials are all failures. That is, for a geometric random variable, 


P(X > k}=(1—p)"* 
Example 8b 
Find the expected value of a geometric random variable. 


Solution 


With q = 1 — p, we have 


j=1 
= qE[xX]+1 
Hence, 
pE[|X|=1 
yielding the result 
E(x] = = 
p 


In other words, if independent trials having a common probability p of being 
successful are performed until the first success occurs, then the expected 
number of required trials equals 1/p. For instance, the expected number of rolls 
of a fair die that it takes to obtain the value 1 is 6. 


Example 8c 


Find the variance of a geometric random variable. 


Solution 


To determine Var(X), let us first compute E[X7|. With q = 1 — p, we have 
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Using E[X] = 1/p, the equation for E|X*| yields 


2 
pE[X?] = 7 +1 


Hence, 


giving the result 


p? sp 


q+1 1 = @q 1-p 
2 p2 p2 


Var(X) = 


4.8.2 The Negative Binomial Random Variable 


Suppose that independent trials, each having probability p,0 < p < 1, of being a 
success are performed until a total of r successes is accumulated. If we let X equal 
the number of trials required, then 


(8.2) 


n—-1 r n-r 
P{X =n} = wag le 2 ~P) n=r,r+1,... 


Equation (8.2) — follows because, in order for the rth success to occur at the nth 
trial, there must be r — 1 successes in the first n — 1 trials and the nth trial must be a 
success. The probability of the first event is 
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(*-)) r1¢q — py 
ce =F) 


and the probability of the second is p; thus, by independence, Equation (8.2) is 
established. To verify that a total of r successes must eventually be accumulated, 
either we can prove analytically that 


(8.3) 


or we can give a probabilistic argument as follows: The number of trials required to 
obtain r successes can be expressed as Y, + Y, +--+ Y,, where Y, equals the 
number of trials required for the first success, Y, the number of additional trials after 
the first success until the second success occurs, Y, the number of additional trials 
until the third success, and so on. Because the trials are independent and all have 
the same probability of success, it follows that Y,,Y>,...,Y, are all geometric random 


# 
variables. Hence, each is finite with probability 1, so >» Y; must also be finite, 
i=1 


establishing Equation (8.3) 


Any random variable X whose probability mass function is given by Equation 
(8.2) is said to be a negative binomial random variable with parameters (r, p). 
Note that a geometric random variable is just a negative binomial with parameter (1, 


p)- 


In the next example, we use the negative binomial to obtain another solution of the 
problem of the points. 


Example 8d 


If independent trials, each resulting in a success with probability p, are 
performed, what is the probability of r successes occurring before s failures? 


Solution 


The solution will be arrived at by noting that r successes will occur before s 
failures if and only if the rth success occurs no later than the (r + s — 1) trial. This 
follows because if the rth success occurs before or at the (r + s — 1) trial, then it 
must have occurred before the sth failure, and conversely. Hence, from Equation 
(8.2) , the desired probability is 
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Example 8e The Banach match problem 


At all times, a pipe-smoking mathematician carries 2 matchboxes—1 in his left- 
hand pocket and 1 in his right-hand pocket. Each time he needs a match, he is 
equally likely to take it from either pocket. Consider the moment when the 
mathematician first discovers that one of his matchboxes is empty. If it is 
assumed that both matchboxes initially contained N matches, what is the 
probability that there are exactly k matches, k = 0,1,...,N, in the other box? 


Solution 


Let E denote the event that the mathematician first discovers that the right-hand 
matchbox is empty and that there are k matches in the left-hand box at the time. 
Now, this event will occur if and only if the (N + 1) choice of the right-hand 
matchbox is made at the (NV + 1 + N — k) trial. Hence, from Equation (8.2) 


1 
(with p = =,r = N+ 1,andn = 2N —k +1), we see that 


2 
2N—k 1 2N-k+1 
re=("y (3) 


Since there is an equal probability that it is the left-hand box that is first 
discovered to be empty and there are k matches in the right-hand box at that 
time, the desired result is 


Example 8f 


Compute the expected value and the variance of a negative binomial random 
variable with parameters r and p. 


Solution 


We have 
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rs n = n-1 n 
= _ ». aa eran" rsincen( }=>{ 
p XH T r-1 T 


eee by setting 
m=nt+1 


=| 
prva — p) 


where Y is a negative binomial random variable with parameters r + 1, p. Setting 


k = 1 in the preceding equation yields 


Setting k = 2 in the equation for E[X*] and using the formula for the expected 
value of a negative binomial random variable gives 


E[X?| 


Therefore, 


Var(X) 


Thus, from Example 8f 


= 5 Ely - 1] 


r(ret (*) 
= =] )=—|= 
p p p 
r(1—p) 

p? 


, If independent trials, each of which is a success with 


probability p, are performed, then the expected value and variance of the number of 
trials that it takes to amass r successes is r/p and r(1 — p) /p2, respectively. 


Since a geometric random variable is just a negative binomial with parameter r = 1, 


it follows from the preceding example that the variance of a geometric random 


variable with parameter p is equal to (1 — p)/p?, which checks with the result of 


Example 8c 


Example 8g 


Find the expected value and the variance of the number of times one must throw 
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a die until the outcome 1 has occurred 4 times. 


Solution 
Since the random variable of interest is a negative binomial with parameters 


1 
r=4andp= ra it follows that 


iy 
i=l 
Se 
a 
I 


Now, let us suppose that the independent trials are not ended when there have been 
a total of r successes, but that they continue on. Aside from X, the number of trials 
until there have been r successes, some other random variables of interest are, for 
s>0, 


Y : the number of trials until there have been s failures; 
V : the number of trials until there have been either r successes or s failures; 
Z : the number of trials until there have been both at least r successes and at least s 


failures. 


Because each trial is independently a failure with probability 1 — p, it follows that Y is 
a negative binomial random variable with probability mass function 


m1 ey wd 
py=n=(" t)a-» pF es 


To determine the probability mass function of V = min(X, Y), note that the possible 
values of V are all less than r + s. Suppose n <r +s. If either the r“ success or the 
s" failure occurs at time n then, because n <r + s, the other event would not yet 
have occurred. Consequently, V will equal n if either X or Y is equal to n. Because we 
cannot have both that X = n and that Y = n, this yields 


PWV =n) 


P(X =n)+P(Y =n) 


md Rake if aL aks 
= p’(1—-p) + (1-—p)°p"" 5, n<rt+s 
r—-1 s—1 


To determine the probability mass function of Z = max(X,Y), note that Z >1r+ s. For 
n>r+s, if either the r“” success or the s“" failure occurs at time n then the other 
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event must have already occurred by time n. Consequently, for n > r +s, Z will equal 
n if either X or Y is equal to n. This gives 


PZ=n) = PX =n)+P(Y =n) 


t= PO a | oe 
p’(1—-p) -+ (1-—p)°p" *, n<rt+s 
r-1 s-—1 


4.8.3 The Hypergeometric Random Variable 


Suppose that a sample of size n is to be chosen randomly (without replacement) 
from an urn containing N balls, of which m are white and N — mare black. If we let X 
denote the number of white balls selected, then 


(8.4) 


A random variable X whose probability mass function is given by Equation (8.4) 
for some values of n, N, m is said to be a hypergeometric random variable. 


Remark Although we have written the hypergeometric probability mass function with 
i going from 0 to n, P{X = i} will actually be 0, unless i satisfies the inequalities 
n —(N—™m) <i < min(n,m). However, Equation (8.4) _ is always valid because of 


r 
our convention that (;) is equal to 0 when either k <Oorr<k. 


Example 8h 


An unknown number, say, N, of animals inhabit a certain region. To obtain some 
information about the size of the population, ecologists often perform the 
following experiment: They first catch a number, say, m, of these animals, mark 
them in some manner, and release them. After allowing the marked animals time 
to disperse throughout the region, a new catch of size, say, n, is made. Let X 
denote the number of marked animals in this second capture. If we assume that 
the population of animals in the region remained fixed between the time of the 
two catches and that each time an animal was caught it was equally likely to be 
any of the remaining uncaught animals, it follows that X is a hypergeometric 
random variable such that 
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Suppose now that X is observed to equal i. Then, since P;(N) represents the 
probability of the observed event when there are actually N animals present in 
the region, it would appear that a reasonable estimate of N would be the value of 
N that maximizes P;(N). Such an estimate is called a maximum likelihood 
estimate. (See Theoretical Exercises 13 and18 for other examples of this 
type of estimation procedure.) 


The maximization of P;(N) can be done most simply by first noting that 


P(N) = W-m)(N—-n) 
P(N-1) N(N-m-n+i) 


Now, the preceding ratio is greater than 1 if and only if 


(N=m)N =n) 2 NW =m =n7+7) 


or, equivalently, if and only if 


Thus, P;(N) is first increasing and then decreasing and reaches its maximum 
value at the largest integral value not exceeding mn/. This value is the maximum 
likelihood estimate of N. For example, suppose that the initial catch consisted of 
m = 50 animals, which are marked and then released. If a subsequent catch 
consists of n = 40 animals of which i = 4 are marked, then we would estimate 
that there are some 500 animals in the region. (Note that the preceding estimate 
could also have been obtained by assuming that the proportion of marked 
animals in the region, m/N, is approximately equal to the proportion of marked 
animals in our second catch, //n.) 


Example 8i 


A purchaser of electrical components buys them in lots of size 10. It is his policy 
to inspect 3 components randomly from a lot and to accept the lot only if all 3 are 
nondefective. If 30 percent of the lots have 4 defective components and 70 
percent have only 1, what proportion of lots does the purchaser reject? 


Solution 
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Let A denote the event that the purchaser accepts a lot. Now, 


P(A) 


P(A| lot has 4 defectives a + P(A|lothas 1 defective a 

4\ (6 1\/9 
ole) (2) , ole) 

10 10 10 10 
(B) () 
54 
100 


Hence, 46 percent of the lots are rejected. 


If n balls are randomly chosen without replacement from a set of N balls of which the 
fraction p = m/N is white, then the number of white balls selected is hypergeometric. 
Now, it would seem that when m and N are large in relation to n, it shouldn’t make 
much difference whether the selection is being done with or without replacement, 
because, no matter which balls have previously been selected, when m and N are 
large, each additional selection will be white with a probability approximately equal to 
p. In other words, it seems intuitive that when m and N are large in relation to n, the 
probability mass function of X should approximately be that of a binomial random 
variable with parameters n and p. To verify this intuition, note that if X is 
hypergeometric, then, fori <n, 


x= 3 a 
n 
= m!| (N —m)! (N —n)!n! 
— (m-D!i! (N-m—n+d)!(n-D)! N! 
_ fr\mm-1 m-itilN-mN-m-1 
~ \GJNN-1 N-it+1N-i N-i-1 
N-—m-(n-i-1) 
N =t— (an =t—1) 
n\ , _,whenp = m/N and mand N are 
x ("a-p ay 
L large in relation to nand i 
Example 8j 


Determine the expected value and the variance of X, a hypergeometric random 
variable with parameters n, N, and m. 
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Solution 
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Using the identities 


we obtain 


ae = I) 


— mm k-1 
= Ele ed) | 


where Y is a hypergeometric random variable with parameters n — 1, N — 1, and 
m — 1. Hence, upon setting k = 1, we have 


In words, if n balls are randomly selected from a set of N balls, of which m are 
white, then the expected number of white balls selected is nm/N. 


Upon setting k = 2 in the equation for E[X*|, we obtain 


nm 
arene 


E|x’] = 7 


[Y +1] 


nm |(n—1)(m~1) | 


1 
N N-1 


where the final equality uses our preceding result to compute the expected value 
of the hypergeometric random variable Y. 


Because E[X| = nm/N, we can conclude that 
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nm|(n—1)(m — 1) nm 
VE ae a. 


Letting p = m/N and using the identity 


m1 Np=1_ 1-p 


N-1 N-1 ” N-1 


shows that 


Var(X) = np —1)p-(n- ye +1- no 


n-1 
np i= a(2 = 74) 


Remark We have shown in Example 8j __ that ifn balls are randomly selected 
without replacement from a set of N balls, of which the fraction p are white, then the 
expected number of white balls chosen is np. In addition, if N is large in relation to n 
[so that (NV — n)/(N — 1) is approximately equal to 1], then 


Var(X) + np(1 — p) 


In other words, E[X] is the same as when the selection of the balls is done with 
replacement (so that the number of white balls is binomial with parameters n and p), 
and if the total collection of balls is large, then Var(X) is approximately equal to what 
it would be if the selection were done with replacement. This is, of course, exactly 
what we would have guessed, given our earlier result that when the number of balls 
in the urn is large, the number of white balls chosen approximately has the mass 
function of a binomial random variable. 


4.8.4 The Zeta (or Zipf) Distribution 


A random variable is said to have a zeta (sometimes called the Zipf) distribution if its 
probability mass function is given by 


P{X =k} = 


k =1,2,... 


yer 


for some value of a > 0. Since the sum of the foregoing probabilities must equal 1, it 
follows that 
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The zeta distribution owes its name to the fact that the function 


_ 1" of 1\* Tt 
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is known in mathematical disciplines as the Riemann zeta function (after the German 
mathematician G. F. B. Riemann). 


The zeta distribution was used by the Italian economist V. Pareto to describe the 
distribution of family incomes in a given country. However, it was G. K. Zipf who 
applied zeta distribution to a wide variety of problems in different areas and, in doing 
so, popularized its use. 


4.9 Expected Value of Sums of Random 
Variables 


A very important property of expectations is that the expected value of a sum of 
random variables is equal to the sum of their expectations. In this section, we will 
prove this result under the assumption that the set of possible values of the 
probability experiment—that is, the sample space S—is either finite or countably 
infinite. Although the result is true without this assumption (and a proof is outlined in 
the theoretical exercises), not only will the assumption simplify the argument, but it 
will also result in an enlightening proof that will add to our intuition about 
expectations. So, for the remainder of this section, suppose that the sample space S 
is either a finite or a countably infinite set. 


For a random variable X, let X(s) denote the value of X when s € S is the outcome of 
the experiment. Now, if X and Y are both random variables, then so is their sum. That 
is, Z=X+Y ts also a random variable. Moreover, Z(s) = X(s) + Y(s). 


Example 9a 


Suppose that the experiment consists of flipping a coin 5 times, with the outcome 
being the resulting sequence of heads and tails. Suppose X is the number of 
heads in the first 3 flips and Y is the number of heads in the final 2 flips. Let 
Z=xX+/Y. Then, for instance, for the outcome s = (h,t,h,t, h), 
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X(s) = 2 
Y(s) = 1 
Z(s) X(s) + Y(s) =3 


meaning that the outcome (h, t,h,t,h) results in 2 heads in the first three flips, 1 
head in the final two flips, and a total of 3 heads in the five flips. 


Let p(s) = P({s}) be the probability that s is the outcome of the experiment. Because 
we can write any event A as the finite or countably infinite union of the mutually 
exclusive events {s},s € A, it follows by the axioms of probability that 


P(A)= >). p(s) 


SEA 


When A = S, the preceding equation gives 


1= > ps) 


es 


Now, let X¥ be a random variable, and consider E|X]. Because X(s) is the value of X 
when s is the outcome of the experiment, it seems intuitive that E[X|—the weighted 
average of the possible values of X, with each value weighted by the probability that 
X assumes that value—should equal a weighted average of the values X(s),s € S, 
with X(s) weighted by the probability that s is the outcome of the experiment. We 
now prove this intuition. 


Proposition 9.1 
EX1= > X(s)p(S) 


Proof Suppose that the distinct values of X are x;,i => 1. For each i, let S; be the 
event that X is equal to x;. That is, $; = {s:X(s) = x;}. Then, 
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where the final equality follows because S,,5>,... are mutually exclusive events 
whose union is S. 


Example 9b 


Suppose that two independent flips of a coin that comes up heads with 
probability p are made, and let X denote the number of heads obtained. Because 


P(X=0) = P(t,t)=(1-p)*, 
P(X=1) = P(h,t)+P(t,h) = 2p(1—p) 
P(X=2) = P(h,h) =p? 


it follows from the definition of expected value that 


E[X] =0-(1—p)*+1-2p(1 — p) +2: p? = 2p 


which agrees with 


E|X] X(h, hyp? + X(h, t)p(1 — p) + X(t,A)(1 — p)p + X(t) — p)” 


2p? + p-—p)+(-—p)p 
= 2p 


We now prove the important and useful result that the expected value of a sum of 
random variables is equal to the sum of their expectations. 


Corollary 9.2 


For random variables X,,X2,....Xn, 
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Proof Let Z = ». X;. Then, by Proposition 9.1, 
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E[X,]+E[X>]+.. + E[Xp] 


E|Z| 


Example 9c 


Find the expected value of the sum obtained when n fair dice are rolled. 


solution 


Let X be the sum. We will compute E[X] by using the representation 


where X; is the upturned value on die i. Because X; is equally likely to be any of 
the values from 1 to 6, it follows that 


6 
E[X,] = ». i(1/6) = 21/6 =7/2 


Example 9d 


Find the expected total number of successes that result from n trials when trial i 
is a success with probability p,, i= 1,...,n. 
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solution 


Letting 


1, iftrialiisasuccess 
Xj => 


0, iftrialiisa failure 


we have the representation 


Consequently, 


Note that this result does not require that the trials be independent. It includes as 
a special case the expected value of a binomial random variable, which assumes 
independent trials and all p, = p, and thus has mean np. It also gives the 
expected value of a hypergeometric random variable representing the number of 
white balls selected when n balls are randomly selected, without replacement, 
from an urn of N balls of which m are white. We can interpret the hypergeometric 
as representing the number of successes in 7 trials, where trial i is said to be a 
success if the ith ball selected is white. Because the ith ball selected is equally 
likely to be any of the N balls and thus has probability m/N of being white, it 
follows that the hypergeometric is the number of successes in n trials in which 
each trial is a success with probability p = m/N. Hence, even though these 
hypergeometric trials are dependent, it follows from the result of Example 9d 
that the expected value of the hypergeometric is np = nm/N. 


Example 9e 


Derive an expression for the variance of the number of successful trials in 
Example 9d __, and apply it to obtain the variance of a binomial random variable 
with parameters n and p, and of a hypergeometric random variable equal to the 
number of white balls chosen when n balls are randomly chosen from an urn 
containing N balls of which m are white. 


Solution 


Letting X be the number of successful trials, and using the same representation 


n 
for X—namely, X = > X;—as in the previous example, we have 
j= 


E[X?| 
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where the final equation used that X*; = X;. However, because the possible 
values of both X; and X; are 0 or 1, it follows that 


— (n ifX;=1,X;=1 
ea 0, otherwise 


Hence, 
E|X;X;| = P{X; =1,X;= 1} = P(trialsiand j are successes ) 


Thus, with Pi, = P(X; = 1,X; = 1), the preceding and the result of Example 
9d ___ yield that 


(9.1) 
Var (X) = ». 


If X is binomial with parameters n, p, then p, = p and, by the independence of 
trials, Di; = p’, i # j. Consequently, Equation (9.1) _ yields that 


Var (X) = np + n(n — 1)p? — n2p? = np(1 — p) 


On the other hand, if X is hypergeometric, then as each of the N balls is equally 
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4. 


likely to be the i” ball chosen, it follows that p, = m/N. Also, for i # j 


e 
z 


m 
Pi = PQ = LXy = 1) = PH = PA = UM =D =_ Qa 


which follows because given that the i” ball selected is white, each of the other 
N — 1 balls, of which m — 1 are white, is equally likely to be the j“ ball selected. 
Consequently, (9.1 __) yields that 


nm mm-1 nm) 2 
Var(X) = "+ n(n - 1) >  -( ) 


which, as shown in Example 8j __, can be simplified to yield 


n—1 
Var(X) = np(1 — a(2 a: 7 


where p = m/N. 


10 Properties of the Cumulative 


Distribution function 


Recall that for the distribution function F of X, F(b) denotes the probability that the 
random variable X takes on a value that is less than or equal to b. The following are 
some properties of the cumulative distribution function (c.d.f.) F: 


1. F is a nondecreasing function; that is, if a < b, then F(a) < F(b). 
2. lim F(b)=1. 
b—7 o 


3. lim  F(b)=0. 
> — @ 


4. F is right continuous. That is, for any b and any decreasing sequence 
byn,n = 1, that converges to b, jim F(by,) = F(b). 
n (ee) 


Property 1 follows, as was noted in Section 4.1 _, because, for a < b, the event 


{X < a} is contained in the event {X < b} and so cannot have a larger probability. 


Properties 2, 3, and 4 all follow from the continuity property of probabilities (Section 


2.6 


). For instance, to prove property 2, we note that if b,, increases to oo, then the 


events {X < b,,},n = 1, are increasing events whose union is the event {X < oo }. 
Hence, by the continuity property of probabilities, 


, Jim. P{X < by} = P{X < co} =1 


which proves property 2. 


The proof of property 3 is similar and is left as an exercise. To prove property 4, we 
note that if b,, decreases to b, then {X < b,},n = 1, are decreasing events whose 
intersection is {X < b}. The continuity property then yields 


lim P{X < by} = P{X <b} 
na oo 


which verifies property 4. 


All probability questions about X can be answered in terms of the c.d.f., F. For 
example, 


(10.1) 
P{a < X < b} = F(b) — F(a) forall a<b 


This equation can best be seen to hold if we write the event {X < b} as the union of 
the mutually exclusive events {X < a} and {a < X < b}. That is, 


{X < b} ={X <a}u{a<X <b} 


so 


P{X < b} = P{X sa}+ P{a<X <b} 


which establishes Equation (10.1) 


If we want to compute the probability that X is strictly less than b, we can again apply 
the continuity property to obtain 


PIX <b} = (, lim, (x Jie ‘}) 


1 
= lim p(x <o-=) 
n—7o nN 
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Note that P{X < b} does not necessarily equal F(b), since F(b) also includes the 
probability that X equals b. 


Example 10a 


The distribution function of the random variable X is given by 
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F(x) = 


A graph of F(x) is presented in Figure 4.10 . Compute (a) P{X < 3}, (b) 
1 
P{X = 1}, (c) P(X > 5}, and (d) P{2 << X < 4}. 


Figure 4.10 Graph of f(x). 
F(x) 
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Solution 
; 1 ; 1 11 
a. P{X <3} =limP)X <3—-——;=limF(3-—-]=— 
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P{X=1} = P{x<1}-P{x<}} 

b. i: 2 4 4 
= F() -timr(1-2)=3-5=3 
ply ZT). el 
>a} = rsa] 

C. 

te 

= 1-463) =3 

P{2<X<4} = F(4)-F(2) 
d. 1. 
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Summary 


A real-valued function defined on the outcome of a probability experiment is called a 
random variable. 


If X is arandom variable, then the function F(x) defined by 


F(x) = P{X < x} 


is called the distribution function of X. All probabilities concerning X can be stated in 
terms of F. 


A random variable whose set of possible values is either finite or countably infinite is 
called discrete. If X is a discrete random variable, then the function 


p(x) = P{X = x} 
is called the probability mass function of X. Also, the quantity E[X] defined by 


EXI= > xp) 


x:p(x)> 0 


is called the expected value of X. E[X] is also commonly called the mean or the 
expectation of X. 


A useful identity states that for a function g, 


EgOl= +) g@)P@) 


x:p (x ) > 0 
The variance of a random variable X, denoted by Var(X), is defined by 
Var(X) = E[(X — E[X])*] 


The variance, which is equal to the expected square of the difference between X and 
its expected value, is a measure of the spread of the possible values of X. A useful 
identity is 


Var(X) = E[X?] — (E[X])? 
The quantity ./Var(X) is called the standard deviation of X. 


We now note some common types of discrete random variables. The random 
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variable X whose probability mass function is given by 
n\ . % 
p(t) = ("p'a-" PSO 
i 


is said to be a binomial random variable with parameters n and p. Such a random 
variable can be interpreted as being the number of successes that occur when n 
independent trials, each of which results in a success with probability p, are 
performed. Its mean and variance are given by 


E[X]=np_ Var(X) = np(1 — p) 


The random variable X whose probability mass function is given by 


e 44! 
p(i) = oe i=0 


is said to be a Poisson random variable with parameter A. If a large number of 
(approximately) independent trials are performed, each having a small probability of 
being successful, then the number of successful trials that result will have a 
distribution that is approximately that of a Poisson random variable. The mean and 
variance of a Poisson random variable are both equal to its parameter A. That is, 


E[X] = Var(X) =a 


The random variable X whose probability mass function is given by 
pi) =p(l—p)' * i=1,2,... 


is said to be a geometric random variable with parameter p. Such a random variable 
represents the trial number of the first success when each trial is independently a 
success with probability p. Its mean and variance are given by 


Lp 
p? 


1 
E[X] = 5 Var(X) = 
The random variable X whose probability mass function is given by 


: k= 1 i-r : 
p()) = (; z era —p) sg 


is said to be a negative binomial random variable with parameters r and p. Such a 
random variable represents the trial number of the rth success when each trial is 
independently a success with probability p. Its mean and variance are given by 
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r r(1—p) 
E[X] == Var(X%) = —— 

[X] 5 (x) ee 
A hypergeometric random variable X with parameters n, N, and m represents the 
number of white balls selected when n balls are randomly chosen from an urn that 
contains N balls of which m are white. The probability mass function of this random 
variable is given by 


With p = m/N, its mean and variance are 


N-n 
E[X] =np_ Var(X) = vay red —p) 


An important property of the expected value is that the expected value of a sum of 
random variables is equal to the sum of their expected values. That is, 


Problems 


4.1. Two balls are chosen randomly from an urn containing 8 white, 4 
black, and 2 orange balls. Suppose that we win $2 for each black ball 
selected and we lose $1 for each white ball selected. Let X denote 
our winnings. What are the possible values of X, and what are the 
probabilities associated with each value? 

4.2. Two fair dice are rolled. Let X equal the product of the 2 dice. 
Compute P{X = i} for i = 1,..., 36. 

4.3. Three dice are rolled. By assuming that each of the 6° = 216 
possible outcomes is equally likely, find the probabilities attached to 
the possible values that X can take on, where X is the sum of the 3 
dice. 

4.4. Five men and 5 women are ranked according to their scores on 
an examination. Assume that no two scores are alike and all 10! 
possible rankings are equally likely. Let X denote the highest ranking 
achieved by a woman. (For instance, X = 1 if the top-ranked person 
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is female.) Find P{X = i}, i= 1, 2, 3,..., 8, 9, 10. 
4.5. Let X represent the difference between the number of heads and 
the number of tails obtained when a coin is tossed n times. What are 
the possible values of X? 
4.6. In Problem 4.5 __, forn = 3, if the coin is assumed fair, what 
are the probabilities associated with the values that X can take on? 
4.7. Suppose that a die is rolled twice. What are the possible values 
that the following random variables can take on: 

a. the maximum value to appear in the two rolls; 

b. the minimum value to appear in the two rolls; 

c. the sum of the two rolls; 

d. the value of the first roll minus the value of the second roll? 


4.8. lf the die in Problem 4.7 is assumed fair, calculate the 

probabilities associated with the random variables in parts (a) 

through (d). 

4.9. Repeat Example 1c — when the balls are selected with 

replacement. 

4.10. Let X be the winnings of a gambler. Let p(i) = P(X = i) and 

suppose that 
p(0) 
p(2) 


1/3; p(1) = p(— 1) = 13/55; 
p( — 2) = 1/11; p(3) = p( — 3) = 1/165 


Compute the conditional probability that the gambler wins 

i, i= 1, 2,3, given that he wins a positive amount. 

4.11. The random variable X is said to follow the distribution of 
Benford’s Law if 


; i+1 ; 
P(X =i) =log,, =} i= 1,2,3,..,9 


It has been shown to be a good fit for the distribution of the first digit 
of many real life data values. 
a. Verify that the preceding is a probability mass function by 


9 
showing that > P(X =i) =1. 
i=1 
b. Find P(X < j). 


4.12. In the game of Two-Finger Morra, 2 players show 1 or 2 fingers 
and simultaneously guess the number of fingers their opponent will 
show. If only one of the players guesses correctly, he wins an 
amount (in dollars) equal to the sum of the fingers shown by him and 
his opponent. If both players guess correctly or if neither guesses 
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correctly, then no money is exchanged. Consider a specified player, 
and denote by X the amount of money he wins in a single game of 
Two-Finger Morra. 

a. If each player acts independently of the other, and if each 
player makes his choice of the number of fingers he will hold 
up and the number he will guess that his opponent will hold up 
in such a way that each of the 4 possibilities is equally likely, 
what are the possible values of X and what are their 
associated probabilities? 

b. Suppose that each player acts independently of the other. If 
each player decides to hold up the same number of fingers 
that he guesses his opponent will hold up, and if each player 
is equally likely to hold up 1 or 2 fingers, what are the possible 
values of X and their associated probabilities? 


4.13. A salesman has scheduled two appointments to sell vacuum 
cleaners. His first appointment will lead to a sale with probability .3, 
and his second will lead independently to a sale with probability .6. 
Any sale made is equally likely to be either for the deluxe model, 
which costs $1000, or the standard model, which costs $500. 
Determine the probability mass function of X, the total dollar value of 
all sales. 

4.14. Five distinct numbers are randomly distributed to players 
numbered 1 through 5. Whenever two players compare their 
numbers, the one with the higher one is declared the winner. Initially, 
players 1 and 2 compare their numbers; the winner then compares 
her number with that of player 3, and so on. Let X denote the number 
of times player 1 is a winner. Find P{X = i}, i = 0,1, 2, 3,4. 

4.15. The National Basketball Association (NBA) draft lottery involves 
the 11 teams that had the worst won-lost records during the year. A 
total of 66 balls are placed in an urn. Each of these balls is inscribed 
with the name of a team: Eleven have the name of the team with the 
worst record, 10 have the name of the team with the second-worst 
record, 9 have the name of the team with the third-worst record, and 
so on (with 1 ball having the name of the team with the 11th-worst 
record). A ball is then chosen at random, and the team whose name 
is on the ball is given the first pick in the draft of players about to 
enter the league. Another ball is then chosen, and if it “belongs” to a 
team different from the one that received the first draft pick, then the 
team to which it belongs receives the second draft pick. (If the ball 
belongs to the team receiving the first pick, then it is discarded and 
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another one is chosen; this continues until the ball of another team is 
chosen.) Finally, another ball is chosen, and the team named on the 
ball (provided that it is different from the previous two teams) 
receives the third draft pick. The remaining draft picks 4 through 11 
are then awarded to the 8 teams that did not “win the lottery,” in 
inverse order of their won—lost records. For instance, if the team with 
the worst record did not receive any of the 3 lottery picks, then that 
team would receive the fourth draft pick. Let ¥ denote the draft pick 
of the team with the worst record. Find the probability mass function 
of X. 

4.16. A deck of n cards numbered 1 through n are to be turned over 
one a time. Before each card is shown you are to guess which card it 
will be. After making your guess, you are told whether or not your 
guess is correct but not which card was turned over. It turns out that 
the strategy that maximizes the expected number of correct guesses 
fixes a permutation of the n cards, say 1, 2,...,n, and then continually 
guesses 1 until it is correct, then continually guesses 2 until either it 
is correct or all cards have been turned over, and then continually 
guesses 3, and so on. Let G denote the number of correct guesses 
yielded by this strategy. Determine P(G = k). 

Hint: In order for G to be at least k what must be the order of cards 
DN 

4.17. Suppose that the distribution function of X is given by 


0 b<0 
E 0<b<1 
; < 
fh << bie 

Poe i + 1sb<2 
ae 2<b<3 
12 a 
i 3<b 


a. Find P{X = i},i = 1, 2,3. 
: 1 3 
b. Find PIs <X< 33. 


4.18. Four independent flips of a fair coin are made. Let X denote the 
number of heads obtained. Plot the probability mass function of the 
random variable X — 2. 

4.19. If the distribution function of X is given by 
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b<0 


O<b<1 
Leb 2 
F(b) = 
Z=b <3 


3<b)<35 


Pe Eg. CtloR OIL Cx bo his oe 
=) 


bz 30 


calculate the probability mass function of X. 
4.20. A gambling book recommends the following “winning strategy” 
for the game of roulette: Bet $1 on red. If red appears (which has 


18 
probability 39)" then take the $1 profit and quit. If red does not 


20 
appear and you lose this bet (which has probability 33 of occurring), 


make additional $1 bets on red on each of the next two spins of the 
roulette wheel and then quit. Let X denote your winnings when you 
quit. 
a. Find P{X > 0}. 
b. Are you convinced that the strategy is indeed a “winning” 
strategy? Explain your answer! 
c. Find E[X]. 


4.21. Four buses carrying 148 students from the same school arrive 
at a football stadium. The buses carry, respectively, 40, 33, 25, and 
50 students. One of the students is randomly selected. Let X denote 
the number of students who were on the bus carrying the randomly 
selected student. One of the 4 bus drivers is also randomly selected. 
Let Y denote the number of students on her bus. 

a. Which of E[X] or E[Y] do you think is larger? Why? 

b. Compute E[X] and E[Y]. 


4.22. Suppose that two teams play a series of games that ends when 
one of them has won i games. Suppose that each game played is, 
independently, won by team A with probability p. Find the expected 
number of games that are played when (a) i = 2 and (b) i = 3. Also, 


1 
show in both cases that this number is maximized when p = * 


4.23. You have $1000, and a certain commodity presently sells for $2 
per ounce. Suppose that after one week the commodity will sell for 
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either $1 or $4 an ounce, with these two possibilities being equally 
likely. 

a. If your objective is to maximize the expected amount of money 
that you possess at the end of the week, what strategy should 
you employ? 

b. If your objective is to maximize the expected amount of the 
commodity that you possess at the end of the week, what 
strategy should you employ? 


4.24. A and B play the following game: A writes down either number 
1 or number 2, and B must guess which one. If the number that A 
has written down is i and B has guessed correctly, B receives i units 


3 
from A. If B makes a wrong guess, B pays ri unit to A. If B 


randomizes his decision by guessing 1 with probability p and 2 with 
probability 1 — p, determine his expected gain if (a) A has written 
down number 1 and (b) A has written down number 2. 
What value of p maximizes the minimum possible value of B‘s 
expected gain, and what is this maximin value? (Note that B‘s 
expected gain depends not only on p, but also on what A does.) 
Consider now player A. Suppose that she also randomizes her 
decision, writing down number 1 with probability g. What is A‘s 
expected loss if (c) B chooses number 1 and (d) B chooses number 
2? 
What value of g minimizes A‘s maximum expected loss? Show that 
the minimum of A‘s maximum expected loss is equal to the maximum 
of B‘s minimum expected gain. This result, known as the minimax 
theorem, was first established in generality by the mathematician 
John von Neumann and is the fundamental result in the 
mathematical discipline known as the theory of games. The common 
value is called the value of the game to player B. 
4.25. Two coins are to be flipped. The first coin will land on heads 
with probability .6, the second with probability .7. Assume that the 
results of the flips are independent, and let X equal the total number 
of heads that result. 

a. Find P{xX = 1}. 

b. Determine E[X]. 


4.26. One of the numbers 1 through 10 is randomly chosen. You are 
to try to guess the number chosen by asking questions with “yes—no” 
answers. Compute the expected number of questions you will need 
to ask in each of the following two cases: 
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a. Your ith question is to be “Is it i?” i = 1, 2, 3, 4,5, 6, 7, 8, 9, 10. 
b. With each question you try to eliminate one-half of the 
remaining numbers, as nearly as possible. 


4.27. An insurance company writes a policy to the effect that an 
amount of money A must be paid if some event E occurs within a 
year. If the company estimates that EF will occur within a year with 
probability p, what should it charge the customer in order that its 
expected profit will be 10 percent of A? 
4.28. A sample of 3 items is selected at random from a box 
containing 20 items of which 4 are defective. Find the expected 
number of defective items in the sample. 
4.29. There are two possible causes for a breakdown of a machine. 
To check the first possibility would cost C, dollars, and, if that were 
the cause of the breakdown, the trouble could be repaired at a cost 
of R, dollars. Similarly, there are costs C, and R, associated with the 
second possibility. Let p and 1 — p denote, respectively, the 
probabilities that the breakdown is caused by the first and second 
possibilities. Under what conditions on p, C;,R;,i = 1,2, should we 
check the first possible cause of breakdown and then the second, as 
opposed to reversing the checking order, so as to minimize the 
expected cost involved in returning the machine to working order? 
Note: If the first check is negative, we must still check the other 
possibility. 
4.30. A person tosses a fair coin until a tail appears for the first time. 
If the tail appears on the nth flip, the person wins 2” dollars. Let X 
denote the player’s winnings. Show that E[X] = + oo. This problem 
is known as the St. Petersburg paradox. 

a. Would you be willing to pay $1 million to play this game once? 

b. Would you be willing to pay $1 million for each game if you 

could play for as long as you liked and only had to settle up 
when you stopped playing? 


4.31. Each night different meteorologists give us the probability that it 
will rain the next day. To judge how well these people predict, we will 
score each of them as follows: If a meteorologist says that it will rain 
with probability p, then he or she will receive a score of 

1—-(1-p)? if it does rain 


1-p? if it does not rain 


We will then keep track of scores over a certain time span and 
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conclude that the meteorologist with the highest average score is the 
best predictor of weather. Suppose now that a given meteorologist is 
aware of our scoring mechanism and wants to maximize his or her 
expected score. If this person truly believes that it will rain tomorrow 
with probability p*, what value of p should he or she assert so as to 
maximize the expected score? 

4.32. To determine whether they have a certain disease, 100 people 
are to have their blood tested. However, rather than testing each 
individual separately, it has been decided first to place the people 
into groups of 10. The blood samples of the 10 people in each group 
will be pooled and analyzed together. If the test is negative, one test 
will suffice for the 10 people, whereas if the test is positive, each of 
the 10 people will also be individually tested and, in all, 11 tests will 
be made on this group. Assume that the probability that a person has 
the disease is .1 for all people, independently of one another, and 
compute the expected number of tests necessary for each group. 
(Note that we are assuming that the pooled test will be positive if at 
least one person in the pool has the disease.) 

4.33. A newsboy purchases papers at 10 cents and sells them at 15 
cents. However, he is not allowed to return unsold papers. If his daily 


1 
demand is a binomial random variable with n = 10, p = 3 


approximately how many papers should he purchase so as to 
maximize his expected profit? 
4.34. In Example 4b __, suppose that the department store incurs an 
additional cost of c for each unit of unmet demand. (This type of cost 
is often referred to as a goodwill cost because the store loses the 
goodwill of those customers whose demands it cannot meet.) 
Compute the expected profit when the store stocks s units, and 
determine the value of s that maximizes the expected profit. 
4.35. A box contains 5 red and 5 blue marbles. Two marbles are 
withdrawn randomly. If they are the same color, then you win $1.10; if 
they are different colors, then you win —$1.00. (That is, you lose 
$1.00.) Calculate 

a. the expected value of the amount you win; 

b. the variance of the amount you win. 


4.36. Consider the friendship network described by Figure 4.5 
Let X be a randomly chosen person and let Z be a randomly chosen 
friend of X. With f (i) equal to the number of friends of person i, show 


that E[f(Z)] = E[f(X)]. 
4.37. Consider Problem 4.22 with i = 2. Find the variance of the 
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number of games played, and show that this number is maximized 


1 
when p = 3 


4.38. Find Var(X) and Var(Y) for X and Y as given in Problem 
4.21 
4.39. If E[X] = 1 and Var(X) = 5, find 

a. E[(2 + X)7]; 

b. Var(4 + 3X). 


4.40. A ball is drawn from an urn containing 3 white and 3 black 
balls. After the ball is drawn, it is replaced and another ball is drawn. 
This process goes on indefinitely. What is the probability that of the 
first 4 balls drawn, exactly 2 are white? 
4.41. On a multiple-choice exam with 3 possible answers for each of 
the 5 questions, what is the probability that a student will get 4 or 
more correct answers just by guessing? 
4.42. Aman claims to have extrasensory perception. As a test, a fair 
coin is flipped 10 times and the man is asked to predict the outcome 
in advance. He gets 7 out of 10 correct. What is the probability that 
he would have done at least this well if he did not have ESP? 
4.43. A and B will take the same 10-question examination. Each 
question will be answered correctly by A with probability . 7, 
independently of her results on other questions. Each question will 
be answered correctly by B with probability .4, independently both of 
her results on the other questions and on the performance of A. 

a. Find the expected number of questions that are answered 

correctly by both A and B. 
b. Find the variance of the number of questions that are 
answered correctly by either A or B. 


4.44. A communications channel transmits the digits 0 and 1. 
However, due to static, the digit transmitted is incorrectly received 
with probability .2. Suppose that we want to transmit an important 
message consisting of one binary digit. To reduce the chance of 
error, we transmit 00000 instead of 0 and 11111 instead of 1. If the 
receiver of the message uses “majority” decoding, what is the 
probability that the message will be wrong when decoded? What 
independence assumptions are you making? 

4.45. A satellite system consists of n components and functions on 
any given day if at least k of the n components function on that day. 
On a rainy day, each of the components independently functions with 
probability p,, whereas on a dry day, each independently functions 
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with probability p,. If the probability of rain tomorrow is a, what is the 
probability that the satellite system will function? 

4.46. A student is getting ready to take an important oral examination 
and is concerned about the possibility of having an “on” day or an 
“off day. He figures that if he has an on day, then each of his 
examiners will pass him, independently of one another, with 
probability .8, whereas if he has an off day, this probability will be 
reduced to .4. Suppose that the student will pass the examination if a 
majority of the examiners pass him. If the student believes that he is 
twice as likely to have an off day as he is to have an on day, should 
he request an examination with 3 examiners or with 5 examiners? 
4.47. Suppose that it takes at least 9 votes from a 12-member jury to 
convict a defendant. Suppose also that the probability that a juror 
votes a guilty person innocent is .2, whereas the probability that the 
juror votes an innocent person guilty is .1. If each juror acts 
independently and if 65 percent of the defendants are guilty, find the 
probability that the jury renders a correct decision. What percentage 
of defendants is convicted? 

4.48. In some military courts, 9 judges are appointed. However, both 
the prosecution and the defense attorneys are entitled to a 
peremptory challenge of any judge, in which case that judge is 
removed from the case and is not replaced. A defendant is declared 
guilty if the majority of judges cast votes of guilty, and he or she is 
declared innocent otherwise. Suppose that when the defendant is, in 
fact, guilty, each judge will (independently) vote guilty with probability 
.7, whereas when the defendant is, in fact, innocent, this probability 
drops to .3. 

a. What is the probability that a guilty defendant is declared 
guilty when there are (i) 9, (ii) 8, and (iii) 7 judges? 

b. Repeat part (a) for an innocent defendant. 

c. If the prosecuting attorney does not exercise the right to a 
peremptory challenge of a judge, and if the defense is limited 
to at most two such challenges, how many challenges should 
the defense attorney make if he or she is 60 percent certain 
that the client is guilty? 


4.49. It is known that diskettes produced by a certain company will 
be defective with probability .01, independently of one another. The 
company sells the diskettes in packages of size 10 and offers a 
money-back guarantee that at most 1 of the 10 diskettes in the 
package will be defective. The guarantee is that the customer can 
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return the entire package of diskettes if he or she finds more than 1 
defective diskette in it. If someone buys 3 packages, what is the 
probability that he or she will return exactly 1 of them? 
4.50. When coin 1 is flipped, it lands on heads with probability . 4; 
when coin 2 is flipped, it lands on heads with probability .7. One of 
these coins is randomly chosen and flipped 10 times. 
a. What is the probability that the coin lands on heads on exactly 
7 of the 10 flips? 
b. Given that the first of these 10 flips lands heads, what is the 
conditional probability that exactly 7 of the 10 flips land on 
heads? 


4.51. Each member of a population of size n is, independently, 
female with probability p or male with probability 1 — p. Let X be the 
number of the other n — 1 members of the population that are the 
same sex as is person 1. (So X = n — 1 if all n people are of the 
same sex.) 

a. Find P(X = i), i= 0,..,.n—1. 


Now suppose that two people of the same sex will, independently of 
other pairs, be friends with probability a; whereas two persons of 
opposite sexes will be friends with probability 8. Find the probability 
mass function of the number of friends of person 1. 

4.52. In a tournament involving players 1, 2,3,4, players 1 and 2 play 
a game, with the loser departing and the winner then playing against 
player 3, with the loser of that game departing and the winner then 
playing player 4The winner of the game involving player 4 is the 
tournament winner. Suppose that a game between players i and | is 


won by player i with probability at 


a. Find the expected number of games played by player 1. 
b. Find the expected number of games played by player 3. 


4.53. Suppose that a biased coin that lands on heads with probability 
p is flipped 10 times. Given that a total of 6 heads results, find the 
conditional probability that the first 3 outcomes are 
a. h, t, t (meaning that the first flip results in heads, the second 
in tails, and the third in tails); 
bs €, 1t,t. 


4.54. The expected number of typographical errors on a page of a 
certain magazine is .2. What is the probability that the next page you 
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read contains (a) 0 and (b) 2 or more typographical errors? Explain 
your reasoning! 
4.55. The monthly worldwide average number of airplane crashes of 
commercial airlines is 3.5. What is the probability that there will be 
a. at least 2 such accidents in the next month; 
b. at most 1 accident in the next month? Explain your reasoning! 


4.56. Approximately 80,000 marriages took place in the state of New 
York last year. Estimate the probability that for at least one of these 
couples, 
a. both partners were born on April 30; 
b. both partners celebrated their birthday on the same day of the 
year. 


State your assumptions. 
4.57. State your assumptions. Suppose that the average number of 
cars abandoned weekly on a certain highway is 2.2. Approximate the 
probability that there will be 

a. no abandoned cars in the next week; 

b. at least 2 abandoned cars in the next week. 


4.58. A certain typing agency employs 2 typists. The average 
number of errors per article is 3 when typed by the first typist and 4.2 
when typed by the second. If your article is equally likely to be typed 
by either typist, approximate the probability that it will have no errors. 
4.59. How many people are needed so that the probability that at 


1 
least one of them has the same birthday as you is greater than 3? 


4.60. Suppose that the number of accidents occurring on a highway 
each day is a Poisson random variable with parameter A = 3. 
a. Find the probability that 3 or more accidents occur today. 
b. Repeat part (a) under the assumption that at least 1 accident 
occurs today. 


4.61. Compare the Poisson approximation with the correct binomial 
probability for the following cases: 

a. P{X = 2} when n = 8, p= .1; 

b. P{X = 9} when n = 10, p = .95; 

c. P{X = 0} when n = 10, p= .1; 

d. P{X = 4} when n = 9, p= .2. 


4.62. lf you buy a lottery ticket in 50 lotteries, in each of which your 


1 
chance of winning a prize is i00’ what is the (approximate) 


probability that you will win a prize 
a. at least once? 
b. exactly once? 
c. at least twice? 


4.63. The number of times that a person contracts a cold in a given 
year is a Poisson random variable with parameter 2 = 5. Suppose 
that a new wonder drug (based on large quantities of vitamin C) has 
just been marketed that reduces the Poisson parameter to 2 = 3 for 
75 percent of the population. For the other 25 percent of the 
population, the drug has no appreciable effect on colds. If an 
individual tries the drug for a year and has 2 colds in that time, how 
likely is it that the drug is beneficial for him or her? 

4.64. The probability of being dealt a full house in a hand of poker is 
approximately .0014. Find an approximation for the probability that in 
1000 hands of poker, you will be dealt at least 2 full houses. 

4.65. Consider n independent trials, each of which results in one of 
the outcomes 1, ...,k with respective probabilities 


k 
Die? ». p,; = 1. Show that if all the p, are small, then the 
ia 
probability that no trial outcome occurs more than once is 
approximately equal to exp( — n(n — D> p-f2) 
i 


4.66. People enter a gambling casino at a rate of 1 every 2 minutes. 
a. What is the probability that no one enters between 12:00 and 
12:05? 
b. What is the probability that at least 4 people enter the casino 
during that time? 


4.67. The suicide rate in a certain state is 1 suicide per 100,000 
inhabitants per month. 

a. Find the probability that in a city of 400,000 inhabitants within 
this state, there will be 8 or more suicides in a given month. 

b. What is the probability that there will be at least 2 months 
during the year that will have 8 or more suicides? 

c. Counting the present month as month number 1, what is the 
probability that the first month to have 8 or more suicides will 
be month number i, i => 1? What assumptions are you 
making? 
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4.68. Each of 500 soldiers in an army company independently has a 
certain disease with probability 1/107. This disease will show up ina 
blood test, and to facilitate matters, blood samples from all 500 
soldiers are pooled and tested. 

a. What is the (approximate) probability that the blood test will be 
positive (that is, at least one person has the disease)? 
Suppose now that the blood test yields a positive result. 

b. What is the probability, under this circumstance, that more 
than one person has the disease? 

Now, suppose one of the 500 people is Jones, who knows that 
he has the disease. 

c. What does Jones think is the probability that more than one 
person has the disease? 

Because the pooled test was positive, the authorities have 
decided to test each individual separately. The first i — 1 of 
these tests were negative, and the ith one—which was on 
Jones-was positive. 

d. Given the preceding scenario, what is the probability, as a 
function of i, that any of the remaining people have the 
disease? 


4.69. A total of 2n people, consisting of n married couples, are 
randomly seated (all possible orderings being equally likely) at a 
round table. Let C; denote the event that the members of couple i are 
seated next to each other, i = 1,...,n. 

a. Find P(C;). 

b. For j # i, find P(C; |C;). 

c. Approximate the probability, for n large, that there are no 

married couples who are seated next to each other. 


4.70. Repeat the preceding problem when the seating is random but 
subject to the constraint that the men and women alternate. 
4.71. In response to an attack of 10 missiles, 500 antiballistic 
missiles are launched. The missile targets of the antiballistic missiles 
are independent, and each antiballstic missile is equally likely to go 
towards any of the target missiles. If each antiballistic missile 
independently hits its target with probability .1, use the Poisson 
paradigm to approximate the probability that all missiles are hit. 
4.72. A fair coin is flipped 10 times. Find the probability that there is a 
string of 4 consecutive heads by 

a. using the formula derived in the text; 
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b. using the recursive equations derived in the text. 
c. Compare your answer with that given by the Poisson 
approximation. 


4.73. At time 0, a coin that comes up heads with probability p is 
flipped and falls to the ground. Suppose it lands on heads. At times 
chosen according to a Poisson process with rate A, the coin is picked 
up and flipped. (Between these times, the coin remains on the 
ground.) What is the probability that the coin is on its head side at 
time t? 
Hint: What would be the conditional probability if there were no 
additional flips by time t, and what would it be if there were additional 
flips by time t? 
4.74. Consider a roulette wheel consisting of 38 numbers 1 through 
36, 0, and double 0. If Smith always bets that the outcome will be 
one of the numbers 1 through 12, what is the probability that 

a. Smith will lose his first 5 bets; 

b. his first win will occur on his fourth bet? 


4.75. Two athletic teams play a series of games; the first team to win 
4 games is declared the overall winner. Suppose that one of the 
teams is stronger than the other and wins each game with probability 
.6, independently of the outcomes of the other games. Find the 
probability, for i = 4, 5, 6, 7, that the stronger team wins the series in 
exactly i games. Compare the probability that the stronger team wins 
with the probability that it would win a 2-out-of-3 series. 

4.76. Suppose in Problem 4.75 _ that the two teams are evenly 


1 
matched and each has probability 5 of winning each game. Find the 


expected number of games played. 
4.77. An interviewer is given a list of people she can interview. If the 
interviewer needs to interview 5 people, and if each person 


2 
(independently) agrees to be interviewed with probability 3 what is 


the probability that her list of people will enable her to obtain her 
necessary number of interviews if the list consists of (a) 5 people and 
(b) 8 people? For part (b), what is the probability that the interviewer 
will speak to exactly (c) 6 people and (d) 7 people on the list? 

4.78. A fair coin is continually flipped until heads appears for the 10th 
time. Let X denote the number of tails that occur. Compute the 
probability mass function of X. 

4.79. Solve the Banach match problem (Example 8e _ ) when the 
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left-hand matchbox originally contained N, matches and the right- 
hand box contained N, matches. 

4.80. In the Banach matchbox problem, find the probability that at the 
moment when the first box is emptied (as opposed to being found 
empty), the other box contains exactly k matches. 

4.81. An urn contains 4 white and 4 black balls. We randomly choose 
4 balls. If 2 of them are white and 2 are black, we stop. If not, we 
replace the balls in the urn and again randomly select 4 balls. This 
continues until exactly 2 of the 4 chosen are white. What is the 
probability that we shall make exactly n selections? 

4.82. Suppose that a batch of 100 items contains 6 that are defective 
and 94 that are not defective. If X is the number of defective items in 
a randomly drawn sample of 10 items from the batch, find (a) 

P{X = 0} and (b) P{X > 2}. 

4.83. A game popular in Nevada gambling casinos is Keno, which is 
played as follows: Twenty numbers are selected at random by the 
casino from the set of numbers 1 through 80. A player can select 
from 1 to 15 numbers; a win occurs if some fraction of the player’s 
chosen subset matches any of the 20 numbers drawn by the house. 
The payoff is a function of the number of elements in the player’s 
selection and the number of matches. For instance, if the player 
selects only 1 number, then he or she wins if this number is among 
the set of 20, and the payoff is $2.20 won for every dollar bet. (As the 


1 
player’s probability of winning in this case is 7 it is clear that the 


“fair” payoff should be $3 won for every $1 bet.) When the player 
selects 2 numbers, a payoff (of odds) of $12 won for every $1 bet is 
made when both numbers are among the 20. 
a. What would be the fair payoff in this case? 
Let P,, ;, denote the probability that exactly k of the n numbers 
chosen by the player are among the 20 selected by the house. 
b. Compute P,, x 
c. The most typical wager at Keno consists of selecting 10 
numbers. For such a bet, the casino pays off as shown in the 
following table. Compute the expected payoff: 


Keno Payoffs in 10 Number Bets 


Number of matches Dollars won for each $1 bet 


o—4 = 
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Keno Payoffs in 10 Number Bets 


Number of matches Dollars won for each $1 bet 
5 1 
6 17 
7 179 
8 1,299 
9 2,599 
10 24,999 


4.84. In Example 8i _, what percentage of i defective lots does the 
purchaser reject? Find it for i = 1,4. Given that a lot is rejected, what 
is the conditional probability that it contained 4 defective 
components? 

4.85. A purchaser of transistors buys them in lots of 20. It is his 
policy to randomly inspect 4 components from a lot and to accept the 
lot only if all 4 are nondefective. If each component in a lot is, 
independently, defective with probability .1, what proportion of lots is 
rejected? 

4.86. There are three highways in the county. The number of daily 
accidents that occur on these highways are Poisson random 
variables with respective parameters .3,.5, and .7. Find the 
expected number of accidents that will happen on any of these 
highways today. 

4.87. Suppose that 10 balls are put into 5 boxes, with each ball 


5 
independently being put in box i with probability p,, ». p, =1. 
i=1 


a. Find the expected number of boxes that do not have any balls. 
b. Find the expected number of boxes that have exactly 1 ball. 


4.88. There are k types of coupons. Independently of the types of 
previously collected coupons, each new coupon collected is of type i 


k 
with probability p,, Ji = 1. Ifn coupons are collected, find the 
l= 


expected number of distinct types that appear in this set. (That is, 
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find the expected number of types of coupons that appear at least 
once in the set of n coupons.) 
4.89. An urn contains 10 red, 8 black, and 7 green balls. One of the 
colors is chosen at random (meaning that the chosen color is equally 
likely to be any of the 3 colors), and then 4 balls are randomly 
chosen from the urn. Let X be the number of these balls that are of 
the chosen color. 

a. Find P(X = 0). 

b. Let X; equal 1 if the i*” ball selected is of the chosen color, 

and let it equal 0 otherwise. Find P(X; = 1), i = 1,2, 3,4. 
c. Find E[X]. 


Hint: Express X in terms of X,, Xz, X3, X4. 


Theoretical Exercises 


4.1. There are N distinct types of coupons, and each time one is obtained it 
will, independently of past choices, be of type i with probability P;,i = 1,..., N. 
Let T denote the number one need select to obtain at least one of each type. 
Compute P{T = n}. 

Hint: Use an argument similar to the one used in Example 1e 

4.2. lf X has distribution function F, what is the distribution function of e*? 
4.3. If X has distribution function F, what is the distribution function of the 
random variable aX + 6, where a and f are constants, a # 0? 

4.4. The random variable X is said to have the Yule-Simons distribution if 


4 
>1 


PES GE DOD" * 


a. Show that the preceding is actually a probability mass function. That is, 
show that > P{xX =n}=1. 
n=1 


b. Show that E[X] = 2. 
c. Show that E[X*] = o . 


1 1 
nn+iD(n+2) nn+i1) n(n Ft 2)’ 
1 1 
nn+k on ntk 
4.5. Let N be a nonnegative integer-valued random variable. For nonnegative 
values a;,j 2 1, show that 


Hint: For (a), first use that then 


use that 
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(ee) 


> (a, +... + a)P{N = j} = > a;P{N > i} 
j= 


C= 1 


Then show that 


and 


E[N(N+1)] =2 > iP{N >i} 
i=1 


4.6. Let X be such that 
P{X = 1}=p=1-P{x= -1} 


Find c # 1 such that E[c*] = 1. 
4.7. Let X be a random variable having expected value y and variance o?. 
Find the expected value and variance of 
X= 
oO 


Y= 


4.8. Find Var(X) if 
P(X =a) =p=1-P(X =b) 


4.9. Show how the derivation of the binomial probabilities 


P(X == @izc yr eSg oH 


leads to a proof of the binomial theorem 


n 


(x+y)" = > ny xtyn 


i=0 


when x and y are nonnegative. 


x 
Hint: Let p = —— 


x+y 
4.10. Let X be a binomial random variable with parameters n and p. Show that 
1 | 1-(1-p)"** 
X+1] = (n+1)p 


4.11. Let X be the number of successes that result from 2n independent trials, 
when each trial is a success with probability p. Show that P(X = n) is a 
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decreasing function of n. 
4.12. Consider n independent sequential trials, each of which is successful 
with probability p. If there is a total of k successes, show that each of the 
n!/[k!(n — k)!] possible arrangements of the k successes and n — k failures is 
equally likely. 
4.13. There are n components lined up in a linear arrangement. Suppose that 
each component independently functions with probability p. What is the 
probability that no 2 neighboring components are both nonfunctional? 
Hint: Condition on the number of defective components and use the results of 
Example 4c of Chapter 1 
4.14. Let X be a binomial random variable with parameters (n, p). What value 
of p maximizes P{X = k},k = 0,1,...,n? This is an example of a statistical 
method used to estimate p when a binomial (n, p) random variable is 
observed to equal k. If we assume that n is known, then we estimate p by 
choosing that value of p that maximizes P{X = k}. This is known as the 
method of maximum likelihood estimation. 
4.15. A family has n children with probability ap",n => 1, where a < (1 — p)/p. 

a. What proportion of families has no children? 

b. If each child is equally likely to be a boy or a girl (independently of each 

other), what proportion of families consists of k boys (and any number 
of girls)? 


4.16. Suppose that n independent tosses of a coin having probability p of 
coming up heads are made. Show that the probability that an even number of 


1 
heads results is |1 +(q- »)”| where q = 1 — p. Do this by proving and then 


utilizing the identity 


[n /2] 

1) oe wate 1 n m 
oy : q = 51 +4) (g—p) | 
i=0 


where [n/2] is the largest integer less than or equal to n/2. Compare this 
exercise with Theoretical Exercise 3.5 of Chapter 3 

4.17. Let X be a Poisson random variable with parameter A. Show that 

P{X = i} increases monotonically and then decreases monotonically as i 
increases, reaching its maximum when i is the largest integer not exceeding A 


Hint: Consider P{X = i}/P{X = i- 1}. 
4.18. Let X be a Poisson random variable with parameter A. 
a. Show that 


ata ge 
P{X is even} = 5 Le 


by using the result of Theoretical Exercise 4.15 and the 
relationship between Poisson and binomial random variables. 
b. Verify the formula in part (a) directly by making use of the expansion of 


e444 e4. 


4.19. Let X be a Poisson random variable with parameter A. What value of A 

maximizes P{X = k},k => 0? 

4.20. Show that X is a Poisson random variable with parameter A, then 
E[X"] = AaE[(X+1)"*] 


Now use this result to compute E[X*]. 
4.21. Consider n coins, each of which independently comes up heads with 
probability p. Suppose that n is large and p is small, and let A = np. Suppose 
that all n coins are tossed; if at least one comes up heads, the experiment 
ends; if not, we again toss all n coins, and so on. That is, we stop the first time 
that at least one of the n coins come up heads. Let X denote the total number 
of heads that appear. Which of the following reasonings concerned with 
approximating P{X = 1} is correct (in all cases, Y is a Poisson random variable 
with parameter A)? 

a. Because the total number of heads that occur when all n coins are 


rolled is approximately a Poisson random variable with parameter A, 
P(X=j{1} = PY=1}=Ae* 


b. Because the total number of heads that occur when all n coins are 
rolled is approximately a Poisson random variable with parameter A, 


and because we stop only when this number is positive, 
Ae * 
Pelee iy SO) = 


ena 
c. Because at least one coin comes up heads, X will equal 1 if none of the 
other n — 1 coins come up heads. Because the number of heads 
resulting from these n — 1 coins is approximately Poisson with mean 
(n—-1)p A, 
P{x =1}~ P{y=0}=e% 


4.22. From a set of n randomly chosen people, let £;; denote the event that 
persons i and j have the same birthday. Assume that each person is equally 
likely to have any of the 365 days of the year as his or her birthday. Find 

a. P(E3,4|E1,2); 

b. P(E,3 | E12); 

c. P(E2 3 | E12 N E13). What can you conclude from your answers to parts 
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(a)—(c) about the independence of the (°) events E;;? 


4.23. An urn contains 2n balls, of which 2 are numbered 1, 2 are numbered 
2,..., and 2 are numbered n. Balls are successively withdrawn 2 at a time 
without replacement. Let T denote the first selection in which the balls 
withdrawn have the same number (and let it equal infinity if none of the pairs 
withdrawn has the same number). We want to show that, for 0 < a < 1, 

lim P{T > an} = i 


To verify the preceding formula, let M, denote the number of pairs withdrawn 
in the first k selections, k = 1,...,n. 
Argue that when n is large, M, can be regarded as the number of successes 
in k (approximately) independent trials. 

a. Approximate P{M;, = 0} when n is large. 

b. Write the event {T > an} in terms of the value of one of the variables 

My. 
c. Verify the limiting probability given for P{T > an}. 


4.24. Consider a random collection of n individuals. In approximating the 
probability that no 3 of these individuals share the same birthday, a better 
Poisson approximation than that obtained in the text (at least for values of n 
between 80 and 90) is obtained by letting E; be the event that there are at 
least 3 birthdays on day i, i = 1,...,365. 
a. Find P(E;). 
b. Give an approximation for the probability that no 3 individuals share the 
same birthday. 
c. Evaluate the preceding when n = 88. (The exact probability is derived 
inExample 1g of Chapter6_.) 


4.25. Here is another way to obtain a set of recursive equations for 
determining P,,, the probability that there is a string of k consecutive heads in 
a sequence of n flips of a fair coin that comes up heads with probability p: 
a. Argue that for k < n, there will be a string of k consecutive heads if 
either 
1. there is a string of k consecutive heads within the first n — 1 
flips, or 
2. there is no string of k consecutive heads within the first n — k — 1 
flips, flip n — k is a tail, and flips n — k + 1,...,.n are all heads. 


b. Using the preceding, relate P,, to P,,_,. Starting with P, = p*, the 
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recursion can be used to obtain P;,,, then P;.2, and so on, up to P,,. 


4.26. Suppose that the number of events that occur in a specifiedime is a 
Poisson random variable with parameter A. If each event is counted with 
probability p, independently of every other event, show that the number of 
events that are counted is a Poisson random variable with parameter Ap. Also, 
give an intuitive argument as to why this should be so. As an application of the 
preceding result, suppose that the number of distinct uranium deposits in a 
given area is a Poisson random variable with parameter A = 10. If, in a fixed 


1 
period of time, each deposit is discovered independently with probability 50’ 


find the probability that (a) exactly 1, (b) at least 1, and (c) at most 1 deposit is 
discovered during that time. 
4.27. Prove 


Hint: Use integration by parts. 
4.28. If X is a geometric random variable, show analytically that 
P{xX =n+k|X>n}= P{x =k} 


Using the interpretation of a geometric random variable, give a verbal 

argument as to why the preceding equation is true. 

4.29. Let X be a negative binomial random variable with parameters r and p, 

and let Y be a binomial random variable with parameters n and p. Show that 
P{x > n} = P{Y <r} 


Hint: Either one could attempt an analytical proof of the preceding equation, 
which is equivalent to proving the identity 


or one could attempt a proof that uses the probabilistic interpretation of these 
random variables. That is, in the latter case, start by considering a sequence 
of independent trials having a common probability p of success. Then try to 
express the events {X > n} and {Y <r} in terms of the outcomes of this 
sequence. 
4.30. For a hypergeometric random variable, determine 

P{X =k +1}/P{X = k} 
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4.31. Balls numbered 1 through N are in an urn. Suppose that n,n < N, of 
them are randomly selected without replacement. Let Y denote the largest 
number selected. 
a. Find the probability mass function of Y. 
b. Derive an expression for E[Y] and then use Fermat’s combinatorial 
identity (see Theoretical Exercise 11 of Chapter1 __) to simplify 
the expression. 


4.32. A jar contains m + n chips, numbered 1, 2,...,.n + m. A set of size n is 
drawn. If we let X denote the number of chips drawn having numbers that 
exceed each of the numbers of those remaining, compute the probability 
mass function of X. 

4.33. A jar contains n chips. Suppose that a boy successively draws a chip 
from the jar, each time replacing the one drawn before drawing another. The 
process continues until the boy draws a chip that he has previously drawn. Let 
X denote the number of draws, and compute its probability mass function. 
4.34. Repeat Theoretical Exercise 4.33 __, this time assuming that 
withdrawn chips are not replaced before the next selection. 

4.35. From a set of n elements, a nonempty subset is chosen at random in the 
sense that all of the nonempty subsets are equally likely to be selected. Let X 
denote the number of elements in the chosen subset. Using the identities 
given in Theoretical Exercise 12. ofChapter1 —, show that 


E|X] = — faye 
2-(3) 
ee n-27"-2 —n(n+1)2"-? 
(2° —1) 
Show also that for n large, 
n 
Var(X) ~ ii 


in the sense that the ratio Var(X) to n/4 approaches 1 as n approaches oo. 
Compare this formula with the limiting form of Var(Y) when 
PY St) = 1 fia = 1.44% 
4.36. An urn initially contains one red and one blue ball. At each stage, a ball 
is randomly chosen and then replaced along with another of the same color. 
Let X denote the selection number of the first chosen ball that is blue. For 
instance, if the first selection is red and the second blue, then X is equal to 2. 
a. Find P{X > i},i = 1. 
b. Show that with probability 1, a blue ball is eventually chosen. (That is, 


300 of 848 


show that P{X < o }=1.) 
c. Find E[X]. 


4.37. Suppose the possible values of X are {x;}, the possible values of Y are 
{yh and the possible values of X + Y are {z,}. Let A, denote the set of all 
pairs of indices (i, 7) such that x; + Yj = 2k: that is, A, = {(@% f):x; + y;= Biel 
a. Argue that 
PUX+¥=2}= )  P(X=x¥=y)} 
Gide AR 


b. Show that 
E[X+Y] = >» » (xi + y,)PLX = x,¥ =y,} 


k (Uj) € Ax 


c. Using the formula from part (b), argue that 


E[X+Y] = >. > (x; + y,)P{x = xpV = y;} 
j 


L 


d. Show that 
P(X=x) = ) X=, ¥=y,), 
j 


me 
a 
<j 
I 
a 
Ww 
I 


>) Pe =H ¥=y) 
j 


e. Prove that 
E[X + Y] = E[X] + E[Y] 


Self-Test Problems and Exercises 


4.1. Suppose that the random variable X is equal to the number of hits 
obtained by a certain baseball player in his next 3 at bats. If 

P{X = 1} = .3, P{X = 2} = .2, and P{X = 0} = 3P{X = 3}, find E[X]. 
4.2. Suppose that X takes on one of the values 0, 1, and 2. If for some 
constant c, P{X = i} = cP{X =i-—1},i = 1,2, find E[X]. 

4.3. Acoin that when flipped comes up heads with probability p is 
flipped until either heads or tails has occurred twice. Find the expected 
number of flips. 

4.4. Acertain community is composed of m families, n; of which have i 
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" 
children, » n; =m. If one of the families is randomly chosen, let X 
i=1 
- 


denote the number of children in that family. If one of the > in; 

i=1 
children is randomly chosen, let Y denote the total number of children 
in the family of that child. Show that E[Y] => E[X]. 
4.5. Suppose that P{X = 0} = 1 — P{X = 1}. If E[X] = 3Var(X), find 
P{X = 0}. 
4.6. There are 2 coins in a bin. When one of them is flipped, it lands on 
heads with probability .6, and when the other is flipped, it lands on 
heads with probability .3. One of these coins is to be randomly chosen 
and then flipped. Without knowing which coin is chosen, you can bet 
any amount up to $10, and you then either win that amount if the coin 
comes up heads or lose it if it comes up tails. Suppose, however, that 
an insider is willing to sell you, for an amount C, the information as to 
which coin was selected. What is your expected payoff if you buy this 
information? Note that if you buy it and then bet x, you will end up 
either winning x — C or —x — C (that is, losing x + C in the latter case). 
Also, for what values of C does it pay to purchase the information? 
4.7. A philanthropist writes a positive number x on a piece of red paper, 
shows the paper to an impartial observer, and then turns it face down 
on the table. The observer then flips a fair coin. If it shows heads, she 
writes the value 2x and, if tails, the value x/2, on a piece of blue paper, 
which she then turns face down on the table. Without knowing either 
the value x or the result of the coin flip, you have the option of turning 
over either the red or the blue piece of paper. After doing so and 
observing the number written on that paper, you may elect to receive 
as a reward either that amount or the (unknown) amount written on the 
other piece of paper. For instance, if you elect to turn over the blue 
paper and observe the value 100, then you can elect either to accept 
100 as your reward or to take the amount (either 200 or 50) on the red 
paper. Suppose that you would like your expected reward to be large. 

a. Argue that there is no reason to turn over the red paper first, 
because if you do so, then no matter what value you observe, it 
is always better to switch to the blue paper. 

b. Let y be a fixed nonnegative value, and consider the following 
strategy: Turn over the blue paper, and if its value is at least y, 
then accept that amount. If it is less than y, then switch to the 
red paper. Let R,,(x) denote the reward obtained if the 
philanthropist writes the amount x and you employ this strategy. 
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Find E|R,(x)|. Note that E[Ro(x)] is the expected reward if the 
philanthropist writes the amount x when you employ the strategy 
of always choosing the blue paper. 


4.8. Let B(n, p), represent a binomial random variable with parameters 
n and p. Argue that 
P{B(n,p) < i} =1-—P{B(n,1-p)<n-i-1} 


Hint: The number of successes less than or equal to i is equivalent to 
what statement about the number of failures? 
4.9. If X is a binomial random variable with expected value 6 and 
variance 2.4, find P{X = 5}. 
4.10. An urn contains n balls numbered 1 through n. If you withdraw m 
balls randomly in sequence, each time replacing the ball selected 
previously, find P{X = k},k = 1,...,m, where X is the maximum of the m 
chosen numbers. 
Hint: First find P{X < k}. 
4.11. Teams A and B play a series of games, with the first team to win 3 
games being declared the winner of the series. Suppose that team A 
independently wins each game with probability p. Find the conditional 
probability that team A wins 

a. the series given that it wins the first game; 

b. the first game given that it wins the series. 


4.12. A local soccer team has 5 more games left to play. If it wins its 
game this weekend, then it will play its final 4 games in the upper 
bracket of its league, and if it loses, then it will play its final games in 
the lower bracket. If it plays in the upper bracket, then it will 
independently win each of its games in this bracket with probability .4, 
and if it plays in the lower bracket, then it will independently win each of 
its games with probability .7. If the probability that the team wins its 
game this weekend is .5, what is the probability that it wins at least 3 of 
its final 4 games? 

4.13. Each of the members of a 7-judge panel independently makes a 
correct decision with probability .7. If the panel’s decision is made by 
majority rule, what is the probability that the panel makes the correct 
decision? Given that 4 of the judges agreed, what is the probability that 
the panel made the correct decision? 

4.14. On average, 5.2 hurricanes hit a certain region in a year. What is 
the probability that there will be 3 or fewer hurricanes hitting this year? 
4.15. The number of eggs laid on a tree leaf by an insect of a certain 
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type is a Poisson random variable with parameter 2. However, such a 
random variable can be observed only if it is positive, since if it is 0, 
then we cannot know that such an insect was on the leaf. If we let Y 


denote the observed number of eggs, then 
P{Y = i} = P{X =i|X>0} 


where X is Poisson with parameter A. Find E[Y]. 
4.16. Each of n boys and n girls, independently and randomly, chooses 
a member of the other sex. If a boy and girl choose each other, they 
become a couple. Number the girls, and let G; be the event that girl 
number i is part of a couple. Let P) = 1— P( U;_, G;) be the 
probability that no couples are formed. 

a. What is P(G;)? 

b. What is P(G;|G;)? 

c. When nis large, approximate Po. 

d. When n is large, approximate P;,, the probability that exactly k 

couples are formed. 
e. Use the inclusion—exclusion identity to evaluate Po. 


4.17. A total of 2n people, consisting of n married couples, are 
randomly divided into n pairs. Arbitrarily number the women, and let W; 
denote the event that woman i is paired with her husband. 
a. Find P(W;). 
b. For i # j, find P(W;|W;). 
c. When nis large, approximate the probability that no wife is 
paired with her husband. 
d. If each pairing must consist of a man and a woman, what does 
the problem reduce to? 


4.18. A casino patron will continue to make $5 bets on red in roulette 
until she has won 4 of these bets. 

a. What is the probability that she places a total of 9 bets? 

b. What are her expected winnings when she stops? 


18 
Remark: On each bet, she will either win $5 with probability 33 


Hay pe 
or lose $5 with probability 38° 


4.19. When three friends go for coffee, they decide who will pay the 
check by each flipping a coin and then letting the “odd person” pay. If 
all three flips produce the same result (so that there is no odd person), 
then they make a second round of flips, and they continue to do so until 
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there is an odd person. What is the probability that 
a. exactly 3 rounds of flips are made? 
b. more than 4 rounds are needed? 


4.20. Show that if X is a geometric random variable with parameter p, 
then 
—P log(p) 


BUL/x] = 


Hint: You will need to evaluate an expression of the form > a'/i. To 


t= 1 
a 
do so, write a'/i = | x'~1dx, and then interchange the sum and the 
0 


integral. 
4.21. Suppose that 
PX=a=9, Pix =bl=1—p 


a. Show that 


7a is a Bernoulli random variable. 


b. Find Var(x). 


4.22. Each game you play is a win with probability p. You plan to play 5 
games, but if you win the fifth game, then you will keep on playing until 
you lose. 

a. Find the expected number of games that you play. 

b. Find the expected number of games that you lose. 


4.23. Balls are randomly withdrawn, one at a time without replacement, 
from an urn that initially has N white and M black balls. Find the 
probability that n white balls are drawn before m black balls, 
n<=N,m<M. 


4.24. Ten balls are to be distributed among 5 urns, with each ball going 
5 

into urn i with probability p,, > p, = 1. Let X; denote the number of 
i=41 


balls that go into urn i. Assume that events corresponding to the 
locations of different balls are independent. 
a. What type of random variable is X;? Be as specific as possible. 
b. For i # j, what type of random variable is X; + X;? 
c. Find P{x, + X, + X3 = 7}. 


4.25. For the match problem (Example 5m __—in Chapter 2 _ +), find 
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a. the expected number of matches. 
b. the variance of the number of matches. 


4.26. Let a be the probability that a geometric random variable X with 
parameter p is an even number. 


a. Find @ by using the identity a = >. P{X = 23}. 
i=1 


b. Find a by conditioning on whether X = 1 or X > 1. 


4.27. Two teams will play a series of games, with the winner being the 
first team to win a total of 4 games. Suppose that, independently of 
earlier results, team 1 wins each game it plays with probability p, 
0 <p <1.Let N denote the number of games that are played. 

a. Show that P(N = 6) = P(N = 7) with equality only when p = 1/2 


b. Give an intuitive explanation for why equality results when 
p=1/2. 
Hint: Consider what needs to be true in order for the number of 
games to be either 6 or 7. 

c. If p = 1/2, find the probability that the team that wins the first 
game wins the series. 


4.28. An urn has n white and m black balls. Balls are randomly 
withdrawn, without replacement, until a total of k, k < n white balls have 
been withdrawn. The random variable X equal to the total number of 
balls that are withdrawn is said to be a negative hypergeometric 
random variable. 

a. Explain how such a random variable differs from a negative 

binomial random variable. 
b. Find P{X =r}. 


Hint for (b): In order for X = r to happen, what must be the results of 
the first , — 1 withdrawals? 
4.29. There are 3 coins which when flipped come up heads, 
respectively, with probabilities 1/3, 1/2, 3/4. One of these coins is 
randomly chosen and continually flipped. 

a. Find the probability that there are a total of 5 heads in the first 8 

flips. 
b. Find the probability that the first head occurs on flip 5. 


4.30. If X is a binomial random variable with parameters n and p, what 
type of random variable is n — X. 


4.31. Let X be the i“” smallest number in a random sample of n of the 
numbers 1, ....2 + m. Find the probability mass function of X. 

4.32. Balls are randomly removed from an urn consisting of n red and 
m blue balls. Let X denote the number of balls that have to be removed 
until a total of r red balls have been removed. X is said to be a negative 
hypergeometric random variable. 

a. Find the probability mass function of X. 

b. Find the probability mass function of V, equal to the number of 
balls that have to be removed until either r red balls or s blue 
balls have been removed. 

c. Find the probability mass function of Z, equal to the number of 
balls that have to be removed until both at least r red balls and 
at least s blue balls have been removed. 

d. Find the probability that r red balls are removed before s blue 
balls have been removed. 


Chapter 5 Continuous Random Variables 


Contents 


5.1 Introduction 

5.2 Expectation and Variance of Continuous Random Variables 
5.3 The Uniform Random Variable 

5.4 Normal Random Variables 

5.5 Exponential Random Variables 

5.6 Other Continuous Distributions 


5.7 The Distribution of a Function of a Random Variable 


5.1 Introduction 


In Chapter 4 __, we considered discrete random variable—that is, random variables 
whose set of possible values is either finite or countably infinite. However, there also 
exist random variables whose set of possible values is uncountable. Two examples 
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are the time that a train arrives at a specified stop and the lifetime of a transistor. Let 
X be such a random variable. We say that X is a continuous t random variable if 
there exists a nonnegative function f, defined for all real x € ( — 2,00), having the 
property that for any set B of real numbers, + 


t Sometimes called absolutely continuous. 
(1.1) 


P{X € B} = | room 
B 


* Actually, for technical reasons, Equation (1.1) __ is true only for the 
measurable sets B, which, fortunately, include all sets of practical 
interest. 


The function f is called the probability density function of the random variable X . 
(See Figure 5.1.) 


Figure 5.1 Probability density function /. 


I 


a b 
P(a = X = b) = area of shaded region 
In words, Equation (1.1) — states that the probability that X will be in B may be 


obtained by integrating the probability density function over the set B. Since X must 
assume some value, f must satisfy 


1 = P{X €(—~,0)} = | f (x) dx 


All probability statements about X can be answered in terms of f. For instance, from 
Equation (1.1) __, letting B = [a,b], we obtain 


(1.2) 


b 
rasxsb)=| foods 
If we let a = b in Equation (1.2) __, we get 
roma) | f(x) dx =0 


In words, this equation states that the probability that a continuous random variable 
will assume any fixed value is zero. Hence, for a continuous random variable, 


P{X < a} = P{X < a} = F(a) -| f(x) dx 


Example 1a 


Suppose that X is a continuous random variable whose probability density 
function is given by 


C(4x — 2x?) 0<x<2 
f(x) = 


0 otherwise 


a. What is the value of C? 
b. Find P{X > 1}. 


Solution 


a. Since f is a probability density function, we must have | f(x) dx = 1, 
implying that 


2 
c| (4x — 2x?) dx=1 
0 
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or 


2x3]|*~? 
clas = | =1 
x=0 
or 
C=— 
Hence, 
co 2 
3 1 
b. roca) | er (4x — 2x*)dx = 5 
1 1 
Example 1b 


The amount of time in hours that a computer functions before breaking down is a 
continuous random variable with probability density function given by 


Ae~*/100 ¥>0 


ray ={" x<0 


What is the probability that 


a. a computer will function between 50 and 150 hours before breaking 
down? 
b. it will function for fewer than 100 hours? 


Solution 


a. Since 


=| foydx=a] ere de 
—oo 0 


- 1 
=100A or A=—— 
: or“ 700 


we obtain 


1 = —A(100)e7*/100 


Hence, the probability that a computer will function between 50 and 150 
hours before breaking down is given by 


150 150 
1 = = 
e x/100 dx= -—e x/100 
100 
50 50 


P{50 <X < 150} 


= e 1/2_ 9-3/2 ~ 383 


b. Similarly, 


100 

1 100 

P{X < 100} = { Tia ax ape r/ 0 =1-e 1 632 
0 


In other words, approximately 63.2 percent of the time, a computer will fail before 
registering 100 hours of use. 


Example 1c 


The lifetime in hours of a certain kind of radio tube is a random variable having a 
probability density function given by 


0 x < 100 
f(x) = 4 100 


x2 


x > 100 


What is the probability that exactly 2 of 5 such tubes in a radio set will have to be 
replaced within the first 150 hours of operation? Assume that the events 

E,, i = 1,2,3,4,5, that the ith such tube will have to be replaced within this time are 
independent. 


Solution 


From the statement of the problem, we have 


150 
P(E;) - | f(x) dx 
0 


150 
= 00 | x * dx 
100 


Hence, from the independence of the events E;, it follows that the desired 


probability is 
5\/1\"/2\* _ 80 
2/)\3/ \3)/ 243 


The relationship between the cumulative distribution F and the probability density f is 
expressed by 
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F(a) = P{X € (— ~,a]} = | f(x) dx 


Differentiating both sides of the preceding equation yields 
d ray = 
qa! Y= F@ 


That is, the density is the derivative of the cumulative distribution function. A 
somewhat more intuitive interpretation of the density function may be obtained from 
Equation (1.2) as follows: 


at+e/2 

E E 

sercorg}= | f (x) dx = ef (a) 
a-ée/2 


when ¢e is small and when f(- ) is continuous at x = a. In other words, the probability 
that X will be contained in an interval of length ¢ around the point a is approximately 
ef (a). From this result, we see that f(a) is a measure of how likely it is that the 
random variable will be near a. 


Example 1d 

If X is continuous with distribution function Fy and density function f,, find the 
density function of Y = 2X. 

Solution 


We will determine f,, in two ways. The first way is to derive, and then 
differentiate, the distribution function of Y: 


Fy(a) = P{Y <a} 
= P{2X <a} 
= P{X <a/2} 
= Fy(a/2) 


Differentiation gives 
1 
fy(@) = 5 f ,(a/2) 


Another way to determine f,, is to note that 
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2 
~ 


ef (a) 


Dividing through by e gives the same result as before. 


5.2 Expectation and Variance of Continuous 
Random Variables 


In Chapter 4 __, we defined the expected value of a discrete random variable X by 


E[X] = > 2P(x = 


x 


If X is a continuous random variable having probability density function f(x), then, 
because 


f(x) dx = P{x<X<x+dx} fordx small 


it is easy to see that the analogous definition is to define the expected value of X by 
E[X] = | xf (x) dx 


Example 2a 


Find E[X| when the density function of X is 


ac i OSS 1 


0 otherwise 


ro = 


Solution 


E[X] = | roa 


Example 2b 


The density function of X is given by 


1 if O<x<1 


0 otherwise 


foo =| 


Find Ele. 


Solution 


Let Y = e*. We start by determining Fy, the cumulative distribution function of Y. 
Now, for1 <x <e, 


Fy (x) PLY = 3 


= P{e* <x} 


= P{X < log(x)} 


log(x) 
7 | f(y)dy 
0 


= log(x) 


By differentiating F,(x), we can conclude that the probability density function of Y 
is given by 


Hence, 
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Hs 
® 
2s 
ll 
by 
= 
lI 


xf (x) dx 


= dx 


=e-1 


Although the method employed in Example 2b — to compute the expected value of 
a function of X is always applicable, there is, as in the discrete case, an alternative 
way of proceeding. The following is a direct analog of Proposition 4.1 of Chapter 
4 


Proposition 2.1 


If X is a continuous random variable with probability density function f(x), then, 
for any real-valued function g, 
Elg(X)] = | g(x) f(x) dx 


An application of Proposition 2.1 toExample 2b yields 


Efe*] 


1 
[ ea since f(x) =1, 0<x<1 
0 
= e-1 
which is in accord with the result obtained in that example. 


The proof of Proposition 2.1 is more involved than that of its discrete random 
variable analog. We will present such a proof under the provision that the random 
variable g(X) is nonnegative. (The general proof, which follows the argument in 
the case we present, is indicated in Theoretical Exercises 5.2. and5.3 _ .) 
We will need the following lemma, which is of independent interest. 


Lemma 2.1 


For a nonnegative random variable Y, 


E[Y] -| P{Y > y}dy 
0 


Proof We present a proof when Y is a continuous random variable with 
probability density function f,,. We have 


[ rv >na-| | f (x) dx dy 
) Oo “y 


where we have used the fact that P{Y > y} = | f (x) dx. Interchanging the 
y 


order of integration in the preceding equation yields 


| P{Y > y}dy | [ « f(x) dx 
0 0 0 

| xf -(x) dx 

0 


= E[Y] 


Proof of Proposition 2.1 From Lemma 2.1 __, for any function g for which g(x) = 0 


, 


Elg(X)] | P{g(X) > y}dy 
0 


| | f(x) dx dy 
0 Yx:g(x)>y 

g(x) 
| | dy f(x) dx 
x:g(x)>040 


| g(x) f(x) dx 
x:g(x)>0 


which completes the proof. 


Example 2c 


A stick of length 1 is split at a point U having density function f(w) = 1,0 <u< 1. 
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Determine the expected length of the piece that contains the point p,0 < p < 1. 


Solution 


Let L,,(U) denote the length of the substick that contains the point p, and note 
that 


1-U U<p 


bp (U) “15 U>p 


(See Figure 5.2.) Hence, from Proposition 2.1 __, 


a | 
E[L,(U)] = [ taco 
0 
Dp 1 
= amanda | udu 
0 p 
i Gp. oF 
oo 2 2 
1 
= 5+p-p) 


| - U 
aS < 
0 U p l (a) 


1 
Since p(1 — p) is maximized when p = > it is interesting to note that the 


expected length of the substick containing the point p is maximized when p is the 
midpoint of the original stick. 


Example 2d 


Suppose that if you are s minutes early for an appointment, then you incur the 
cost cs, and if you are s minutes late, then you incur the cost ks. Suppose also 
that the travel time from where you presently are to the location of your 
appointment is a continuous random variable having probability density function 
f . Determine the time at which you should depart if you want to minimize your 
expected cost. 
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Solution 
Let X denote the travel time. If you leave t minutes before your appointment, then 


your cost—call it C,(X)—is given by 


c(tt—X) ifX<t 


Co) = ie =—t) #XSt 


Therefore, 


E{C;(X)] Cex) f(x) dx 


T 
~ 3 


t (oe) 
| e(t—=—x) f(x) dx+ | k(x — t)f (x) dx 
0 t 


t t co co 
a f(x) dx — | xf (x) dx+ | xf (x) dx — | f(x) dx 
0 0 t t 


The value of t that minimizes E|C,(X)] can now be obtained by calculus. 
Differentiation yields 


~ E[C,(X)] = ct f(t) + cF(t) — ct f(t) — kt f(t) + kt FO — kL - FO) 
(k+c)F(t)—k 


Equating the rightmost side to zero shows that the minimal expected cost is 
obtained when you leave t* minutes before your appointment, where t* satisfies 


. k 
CS Sar ary: 


Asin Chapter 4  , wecan use Proposition 2.1 to show the following. 


Corollary 2.1 
If a and b are constants, then 
E[aX + b] = aE[X] +b 
The proof of Corollary 2.1 for a continuous random variable X is the same as 
the one given for a discrete random variable. The only modification is that the 


sum is replaced by an integral and the probability mass function by a probability 
density function. 
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The variance of a continuous random variable is defined exactly as it is fora 
discrete random variable, namely, if X is a random variable with expected value u 
, then the variance of X is defined (for any type of random variable) by 


Var(X) = E[(X ~ »)*] 
The alternative formula, 
Var(X) = E[X?] — (E[X])° 
is established in a manner similar to its counterpart in the discrete case. 


Example 2e 


Find Var(X) for X as given in Example 2a 


Solution 


We first compute E[X?]. 


=“ 
Ss 
et? 
Il 
8 
8 
N 
~; 
cr™ 
8 
ar 
a 
a 


ll 
Nie S— 
> 

i) 

R 

w 

Qa 

R 


2 
Hence, since E[X]| = 3’ we obtain 


a ae 
a(X)=>-\3) = Fe 
It can be shown that, for constants a and b, 

Var(aX + b) = a?Var(X) 


The proof mimics the one given for discrete random variables. 


There are several important classes of continuous random variables that appear 
frequently in applications of probability; the next few sections are devoted to a study 
of some of them. 
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5.3 The Uniform Random Variable 


A random variable is said to be uniformly distributed over the interval (0, 1) if its 
probability density function is given by 


(3.1) 


1 0<x<il 
0 otherwise 


ro =| 


Note that Equation (3.1) is a density function, since f(x) = 0 and 
co 1 
| f(x) dx = | dx = 1. Because f(x) > 0 only when x € (0,1), it follows that X 
—oco 0 


must assume a value in interval (0, 1). Also, since f(x) is constant for x € (0,1), X is 
just as likely to be near any value in (0, 1) as it is to be near any other value. To 
verify this statement, note that forany0 <a<b<1, 


b 
rasxsh)=| foyar=b~s 


In other words, the probability that X is in any particular subinterval of (0, 1) equals 
the length of that subinterval. 


In general, we say that X is a uniform random variable on the interval (a, f) if the 
probability density function of X is given by 


(3.2) 


FO) = ifa<x<B 
x)= a 


0 otherwise 


a 
Since F(a) = | f (x) dx, it follows from Equation (3.2) — that the distribution 


function of a uniform random variable on the interval (a, B) is given by 


0 asa 
F(a) = aug HOSP 
1 Caf 


Figure 5.3 presents a graph of f(a) and F(a). 
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Figure 5.3 Graph of (a) f(a) and (b) F(a) for a uniform (a, #) random variable. 


f(a) 


B-a 


a 


(a) 
F(a) 


(b) 
Example 3a 
Let X be uniformly distributed over (a, £). Find (a) E[X] and (b) Var(X). 
Solution 


E[X] = | xy (xy ax 
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In words, the expected value of a random variable that is uniformly 
distributed over some interval is equal to the midpoint of that interval. 
b. To find Var(X), we first calculate E[X?]. 


E[X?] 


II 
go 
D 
Ll Re 
R 
ee 
nN 
Q 
ee 


Hence, 
B?+aBt+a* (a+ fy’ 
at: [an ae 
(B — a)? 

12 


Var(X) 


Therefore, the variance of a random variable that is uniformly distributed 
over some interval is the square of the length of that interval divided by 12. 


Example 3b 


If X is uniformly distributed over (0, 10), calculate the probability that (a) X < 3, 
(b) X > 6, and(c)3 < xX <8. 


Solution 


3 
1 3 
a. P{X < 3} = 10 * = io 
0 
10 
1 


4 
6 


8 
1 1 
3 


Example 3c 


Buses arrive at a specified stop at 15-minute intervals starting at 7 A.M. That is, 
they arrive at 7, 7:15, 7:30, 7:45, and so on. If a passenger arrives at the stop at 
a time that is uniformly distributed between 7 and 7:30, find the probability that he 
waits 


a. less than 5 minutes for a bus; 
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b. more than 10 minutes for a bus. 


Solution 


Let X denote the number of minutes past 7 that the passenger arrives at the stop. 
Since X is a uniform random variable over the interval (0, 30), it follows that the 
passenger will have to wait less than 5 minutes if (and only if) he arrives between 
7:10 and 7:15 or between 7:25 and 7:30. Hence, the desired probability for part 
(a) is 


15 30 

1 1 1 

rao <x<15} + Aas-<x-<20)- | sot | ap t= 3 
25 


10 


Similarly, he would have to wait more than 10 minutes if he arrives between 7 
and 7:05 or between 7:15 and 7:20, so the probability for part (b) is 


1 
P(O<X<5}+PUIS<X<20}= 5 


The next example was first considered by the French mathematician Joseph L. F. 
Bertrand in 1889 and is often referred to as Bertrand’s paradox. |t represents our 
initial introduction to a subject commonly referred to as geometrical probability. 


Example 3d 


Consider a random chord of a circle. What is the probability that the length of the 
chord will be greater than the side of the equilateral triangle inscribed in that 
circle? 


Solution 


As stated, the problem is incapable of solution because it is not clear what is 
meant by a random chord. To give meaning to this phrase, we shall reformulate 
the problem in two distinct ways. 


The first formulation is as follows: The position of the chord can be determined by 
its distance from the center of the circle. This distance can vary between 0 and r, 
the radius of the circle. Now, the length of the chord will be greater than the side 
of the equilateral triangle inscribed in the circle if the distance from the chord to 
the center of the circle is less than r/2. Hence, by assuming that a random chord 
is a chord whose distance D from the center of the circle is uniformly distributed 
between 0 andr, we see that the probability that the length of the chord is 
greater than the side of an inscribed equilateral triangle is 


r r/2 1 

~ r 2 
For our second formulation of the problem, consider an arbitrary chord of the 
circle; through one end of the chord, draw a tangent. The angle 6 between the 
chord and the tangent, which can vary from 0° to 180°, determines the position of 
the chord. (See Figure 5.4 _.) Furthermore, the length of the chord will be 
greater than the side of the inscribed equilateral triangle if the angle 6 is between 
60° and 120°. Hence, assuming that a random chord is a chord whose angle @ is 
uniformly distributed between 0° and 180°, we see that the desired answer in this 
formulation is 


P{60 < 6 < 120} = —__—__ = = 


Figure 5.4 


1 1 
Note that random experiments could be performed in such a way that 3 or 3 


would be the correct probability. For instance, if a circular disk of radius r is 
thrown on a table ruled with parallel lines a distance 2r apart, then one and only 
one of these lines would cross the disk and form a chord. All distances from this 
chord to the center of the disk would be equally likely, so that the desired 
probability that the chord’s length will be greater than the side of an inscribed 


1 
equilateral triangle is 5 In contrast, if the experiment consisted of rotating a 
needle freely about a point A on the edge (see Figure 5.4 _) of the circle, the 


1 
desired answer would be 3° 


5.4 Normal Random Variables 
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We say that X is anormal random variable, or simply that X is normally distributed, 
with parameters w and o? if the density of X is given by 


1 


V2m e7 &—H)*/20? —-w<x<0 
TO 


f= 


This density function is a bell-shaped curve that is symmetric about 1. (See Figure 
5.5.) 


Figure 5.5 Normal density function: (a) = 0,0 = 1; (b) arbitrary p, 07. 
399 
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The normal distribution was introduced by the French mathematician Abraham 
DeMoivre in 1733, who used it to approximate probabilities associated with binomial 
random variables when the binomial parameter n is large. This result was later 
extended by Laplace and others and is now encompassed in a probability theorem 
known as the central limit theorem, which is discussed in Chapter 8. The central 
limit theorem, one of the two most important results in probability theory,t gives a 
theoretical base to the often noted empirical observation that, in practice, many 
random phenomena obey, at least approximately, a normal probability distribution. 
Some examples of random phenomena obeying this behavior are the height of a 
man or woman, the velocity in any direction of a molecule in gas, and the error made 
in measuring a physical quantity. 
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t The other is the strong law of large numbers. 


To prove that f(x) is indeed a probability density function, we need to show that 


1 252 
—(x~p)"/20° dy = 1 
e x 
al 


Making the substitution y = (x — n)/o, we see that 


1 Dena 1 2 
—(x—p)“/20 Sa —y*/2 
e dx = e d 
al al y 


(oe) 


Hence, we must show that 


| eY'/2 dy = V2n 


Toward this end, let J = | e¥’/2 dy, Then 


i = | erry] e *'/2 dx 
| | e (V7 +x")/2 dy dx 


We now evaluate the double integral by means of a change of variables to polar 
coordinates. (That is, let x = rcos 6, y = rsin@, and dy dx = r dé dr.) Thus, 


CO A2TT 
r= { | e-’ /2r d@dr 
0 Yo 
= an{ re’ /2dr 
0 


| (oe) 


= —2mne~T’/2 


= 20 


Hence, / = v27, and the result is proved. 
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An important fact about normal random variables is that if X is normally distributed 
with parameters yw and o?, then Y = aX + b is normally distributed with parameters 
au + b and a’o’. To prove this statement, suppose that a > 0. (The proof when a < 0 
is similar.) Let Fy denote the cumulative distribution function of Y. Then 


Fy(x) = P{Y¥Y <x} 
= P{aX+b<x} 


where Fy is the cumulative distribution function of X. By differentiation, the density 
function of Y is then 


fy) 


| 
i le 
ees 
——— 
ta 
Q] | 
> 
a 


1 x—b : 
= sae ae e0|- (= _ u) prot 

1 2 2 
= >= exp{ — (x — b— ap)“ /2(ao)“} 


which shows that Y is normal with parameters ay + b and a2o?. 


An important implication of the preceding result is that if X is normally distributed with 
parameters p and o?, then Z = (X — )/a is normally distributed with parameters 0 
and 1. Such a random variable is said to be a standard, or a unit, normal random 
variable. 


We now show that the parameters yw and o? of a normal random variable represent, 
respectively, its expected value and variance. 


Example 4a 


Find E|X| and Var(X) when X is a normal random variable with parameters p and 


a. 


Solution 


Let us start by finding the mean and variance of the standard normal random 
variable Z = (X — pw) /oa. We have 
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= = xe~*'/2 dx 
= 0 
Thus, 
Var(Z) = E[Z?] 


[oe) 

1 25-x2/2 
mis x“e dx 
| 

= io 


Integration by parts (with u = x and dv = xe **/2) now gives 


Var(Z) _ | —xe7%?/2 


II 
al- 
aes 
8 8 
fas) 

7 
‘ 
N 
g 


Because X = uw + oZ, the preceding yields the results 
E[X] ="+ 08 [Z] =u 
and 


Var(X) = 67Var(Z) = 07 


x 
1 _ 
Toe e y?/2 dy 


The values of ® (x) for nonnegative x are given in Table 5.1 
of x, ® (x) can be obtained from the relationship 


00 [oe} 
+ e *'/2 dx 
ae _ 


It is customary to denote the cumulative distribution function of a standard normal 
random variable by ® (x). That is, 


. For negative values 


(4.1) 
®(-x)=1- ®(%) —o<x<a@ 


Table 5.1 Area ® (x) Under the Standard Normal Curve to the Left of xX. 


ms .00 .01 .02 03 .04 .05 .06 .07 .08 09 


.O | .5000  .5040  .5080 | 5120 | .5160 | .5199 | .5239 | .5279 | .5319 | .5359 


1. 5398 | 5438 | .5478 | .5517 | 5557 =.5596 | .5636 | .5675 | .5714 | .5753 


2.5793 | .5832 | .5871 | .5910 .5948 | .5987 | .6026 | .6064 | 6103 .6141 


3 6179 | 6217 | .6255 | .6293 | .6331 .6368 | .6406 | .6443 | .6480 | .6517 


A 6554 | .6591 | .6628 | 6664  .6700 .6736 | .6772 | .6808 | .6844 | .6879 


.5  .6915 | .6950 | .6985 | .7019 | .7054  .7088 | .7123 | .7157 | .7190 | .7224 


6  .7257 | .7291 | .7324 | .7357 | .7389 .7422 | .7454 | .7486 | .7517 | .7549 


.7580 | .7611 | .7642 | .7673 | .7704_ £7734 | .7764 | .7794 | .7823 | .7852 


8  .7881 | .7910 | .7939 | .7967 | .7995  .8023 | .8051 | .8078 | .8106 | .8133 


9  .8159  .8186 | .8212 | 8238 | .8264 .8289 | .8315 | .8340 | .8365 | .8389 


1.0 .8413 | .8438 | .8461 | 8485 .8508 | .8531 | .8554 | .8577 | .8599 8621 


1.1 .8643 | .8665 | .8686 | .8708 | .8729 .8749 | .8770 | .8790 | .8810 | .8830 


1.2 .8849 .8869  .8888 | .8907 | .8925 | .8944 | .8962 | .8980 | .8997 | .9015 


1.3 .9032 | .9049 | .9066 | .9082 .9099 .9115 | .9131 | .9147 | .9162 | .9177 


1.4  .9192 | .9207 | .9222 | 9236  .9251 .9265 | .9279 | .9292 | .9306 | .9319 


1.5 .9332 | .9345 | .9357 | .9370 .9382 | .9394 | .9406 | .9418 | .9429  .9441 


1.6 .9452  .9463 | .9474 | 9484  .9495 .9505 | .9515 | .9525 | .9535 | .9545 


1.7 | .9554  .9564 | .9573 | .9582 | .9591 | .9599 | .9608 | .9616 | .9625 | .9633 
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a .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 


1.8 | .9641 | .9649 | .9656 | .9664 | .9671 .9678 .9686 .9693 .9699 | .9706 


1.9 | .9713 | .9719 | .9726 | .9732 | .9738 | .9744 | .9750 | .9756 | .9761 | .9767 


2.0 | .9772 | .9778 | .9783 | .9788 | .9793 | .9798 | .9803 | .9808  .9812 | .9817 


2.1 | .9821 | .9826 | .9830 | .9834 | .9838 | .9842 | .9846 | .9850 .9854 | .9857 


2.2 | .9861 | .9864 | .9868 | .9871 | .9875 | .9878 | .9881 | .9884 .9887 | .9890 


2.3 | .9893 | .9896 | .9898 | .9901 | .9904 .9906 .9909 .9911 .9913 / .9916 


2.4 | .9918 | .9920 | .9922 | .9925 | .9927 | .9929 | .9931 | .9932 .9934 | .9936 


2.5 | .9938 | .9940 | .9941 | .9943 | .9945 .9946 .9948 .9949 .9951 | .9952 


2.6 | .9953 | .9955 | .9956 | .9957 | .9959 .9960 .9961 .9962 .9963 | .9964 


The proof of Equation (4.1) — , which follows from the symmetry of the standard 
normal density, is left as an exercise. This equation states that if Z is a standard 
normal random variable, then 


P{Z < —x} = P{Z > x} —0o <x < 00 


Since Z = (X — n)/a is a standard normal random variable whenever X is normally 
distributed with parameters py and o, it follows that the distribution function of X can 
be expressed as 


Fx(a) = POX s a) = o(=* < SF) o (24) 


Example 4b 


If X is anormal random variable with parameters u = 3 and a? = 9, find (a) 
P{2 < X <5}; (b) P{X > 0}; (c) P{|X — 3| > 6}. 


Solution 
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Beer = ple=t 
ea aS ae a 
= 3 1 2 
- o{-bezedl 
a. 
_ 2 1 
- ()-2(4) 
= @® 1-— ® : = 3779 
(fe =. 
oe si 
P{X>0} = P{}——— > ——/}=P{Z> - 1} 
3 3 
b. = 1- 6(-1) 
= (1) 
= 8413 
P{|X-—3| >6} = P{X>9}+P{xX < -3} 
ee om 9-3 Fe —3-3 
. [= >7F he < 3 
G: = P{Z>2}+P{Z< —2} 
= 1- ©(2)+ ®(-2) 
= 2[1- ©(2)| 
= 0456 
Example 4c 


An examination is frequently regarded as being good (in the sense of 
determining a valid grade spread for those taking it) if the test scores of those 
taking the examination can be approximated by a normal density function. (In 
other words, a graph of the frequency of grade scores should have approximately 
the bell-shaped form of the normal density.) The instructor often uses the test 
scores to estimate the normal parameters p and a2 and then assigns the letter 
grade A to those whose test score is greater than 1 + a, B to those whose score 
is between uw and uw +0, C to those whose score is between pw — o and pu, D to 
those whose score is between yw — 20 and p — a, and F to those getting a score 
below pu — 2a. (This strategy is sometimes referred to as grading “on the curve.”) 
Since 
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P{X>p+o} =P{=*>1}=1- @ (4) = .1587 


> 


Plu<X<pto} =Pio<~*<1}= ©(1)- © (0) = 3413 


Plu-o<X<p} =P{-1<=*<o} 


= © (0)— ®(-1) & .3413 


Plu-20<X<p-o} =P\-2<*#*< -1} 


{ 
@ (2) — @ (1) & .1359 
{ 


P{X<p—20} =P{=*< -2}= @(-2) = .0228 


it follows that approximately 16 percent of the class will receive an A grade on 
the examination, 34 percent a B grade, 34 percent a C grade, and 14 percent a D 
grade; 2 percent will fail. 


Example 4d 


An expert witness in a paternity suit testifies that the length (in days) of human 
gestation is approximately normally distributed with parameters uw = 270 and 

o* = 100. The defendant in the suit is able to prove that he was out of the country 
during a period that began 290 days before the birth of the child and ended 240 
days before the birth. If the defendant was, in fact, the father of the child, what is 
the probability that the mother could have had the very long or very short 
gestation indicated by the testimony? 


Solution 


Let X denote the length of the gestation, and assume that the defendant is the 
father. Then the probability that the birth could occur within the indicated period is 


P{X > 290 or X < 240} = P{X > 290} + P{X < 240} 


P 


cee 270 


> 2} + pj” = 


< -3} 


tS O02) 44103) 
0241 


R 


Example 4e 


Suppose that a binary message either 0 or 1 must be transmitted by wire from 
location A to location B. However, the data sent over the wire are subject to a 
channel noise disturbance, so, to reduce the possibility of error, the value 2 is 
sent over the wire when the message is 1 and the value —2 is sent when the 
message is 0. If x,x = + 2, is the value sent at location A, then R, the value 
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received at location B, is given by R = x + N, where N is the channel noise 
disturbance. When the message is received at location B, the receiver decodes it 
according to the following rule: 


If R >.5, then 1 is concluded. 
If R < .5, then O is concluded. 


Because the channel noise is often normally distributed, we will determine the 
error probabilities when N is a standard normal random variable. 


Two types of errors can occur: One is that the message 1 can be incorrectly 
determined to be 0, and the other is that 0 can be incorrectly determined to be 1. 
The first type of error will occur if the message is 1 and 2 + N <.5, whereas the 
second will occur if the message is O and —2 + N => .5. Hence, 


P{error|messageis1} = P{N < — 1.5} 


= 1-— @(1.5) = .0668 
and 
P{error|messageisO} = P{N => 2.5} 
= 1-— ©(2.5) = .0062 
Example 4f 


Value at Risk (VAR) has become a key concept in financial calculations. The VAR 
of an investment is defined as that value v such that there is only a 1 percent 
chance that the loss from the investment will be greater than v. If X, the gain 
from an investment, is a normal random variable with mean yu and variance o?, 
then because the loss is equal to the negative of the gain, the VAR of such an 
investment is that value v such that 


01=P{-X>v} 


Using that —X is normal with mean — and variance o2, we see that 


o1 = p[-*> =} 


oO oO 


= 1-0(%) 


oO 


Because, as indicated by Table 5.1, ® (2.33) = .99, we see that 
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vr # 233 


That is, 


v = VAR = 2.330 —u 


Consequently, among a set of investments all of whose gains are normally 
distributed, the investment having the smallest VAR is the one having the largest 
value of uw — 2.330. 


5.4.1 The Normal Approximation to the Binomial 
Distribution 


An important result in probability theory known as the DeMoivre—Laplace limit 
theorem states that when n is large, a binomial random variable with parameters n 
and p will have approximately the same distribution as a normal random variable with 
the same mean and variance as the binomial. This result was proved originally for 


1 
the special case of p = 5 by DeMoivre in 1733 and was then extended to general p 


by Laplace in 1812. It formally states that if we “standardize” the binomial by first 
subtracting its mean np and then dividing the result by its standard deviation 
/np(1 — p), then the distribution function of this standardized random variable 
(which has mean 0 and variance 1) will converge to the standard normal distribution 
function as n > oo. 


The DeMoivre—Laplace Limit Theorem 


If S,, denotes the number of successes that occur when n independent trials, 


each resulting in a success with probability p, are performed, then, for any 
a<pb, 


a ® (b) — & (a) 


eee 
a Jnp(l-p) 


asn- ©, 
Because the preceding theorem is only a special case of the central limit theorem, 
which is presented in Chapter 8 __, we shall not present a proof. 


Note that we now have two possible approximations to binomial probabilities: the 
Poisson approximation, which is good when n is large and p is small, and the normal 
approximation, which can be shown to be quite good when np(1 — p) is large. (See 
Figure 5.6.) [The normal approximation will, in general, be quite good for values 
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of n satisfying np(1 — p) = 10.] 


Figure 5.6 The probability mass function of a binomial (n, p) random variable 
becomes more and more “normal” as n becomes larger and larger. 
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Example 4g 


Let X be the number of times that a fair coin that is flipped 40 times lands on 
heads. Find the probability that X = 20. Use the normal approximation and then 
compare it with the exact solution. 


Solution 


To employ the normal approximation, note that because the binomial is a discrete 
integer-valued random variable, whereas the normal is a continuous random 
variable, it is best to write P{X = i} as P{i— 1/2 < X <i+1/2} before applying 
the normal approximation (this is called the continuity correction). Doing so gives 


P{X = 20} = P{19.5 <X < 20.5} 


=o 19.5. —20 ge 20.5 — 20 
v10 v10 v10 


R 


Page eae 
. ee 


® (.16) — © (—.16) © 1272 


R 
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The exact result is 


Example 4h 


The ideal size of a first-year class at a particular college is 150 students. The 
college, knowing from past experience that, on the average, only 30 percent of 
those accepted for admission will actually attend, uses a policy of approving the 
applications of 450 students. Compute the probability that more than 150 first- 
year students attend this college. 


Solution 


If X denotes the number of students who attend, then X is a binomial random 
variable with parameters n = 450 and p = .3. Using the continuity correction, we 
see that the normal approximation yields 


p{X = (450)(3) _ 150.5 — (450)(.3) 


P{X > 150.5} = "Tasca — 450(.3)(.7) 
= 1-— @ (1.59) 
= 0559 


Hence, less than 6 percent of the time do more than 150 of the first 450 accepted 
actually attend. (What independence assumptions have we made?) 


Example 4i 


To determine the effectiveness of a certain diet in reducing the amount of 
cholesterol in the bloodstream, 100 people are put on the diet. After they have 
been on the diet for a sufficient length of time, their cholesterol count will be 
taken. The nutritionist running this experiment has decided to endorse the diet if 
at least 65 percent of the people have a lower cholesterol count after going on 
the diet. What is the probability that the nutritionist endorses the new diet if, in 
fact, it has no effect on the cholesterol level? 


Solution 


Let us assume that if the diet has no effect on the cholesterol count, then, strictly 
by chance, each person’s count will be lower than it was before the diet with 


1 
probability 3 Hence, if X is the number of people whose count is lowered, then 


the probability that the nutritionist will endorse the diet when it actually has no 
effect on the cholesterol count is 
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(V3) = P{X| > 64.5} 


00(3)(3) 


x1-— (2.9) 
= .0019 


Example 4j 


Fifty-two percent of the residents of New York City are in favor of outlawing 
cigarette smoking on university campuses. Approximate the probability that more 
than 50 percent of a random sample of n people from New York are in favor of 
this prohibition when 


How large would n have to be to make this probability exceed .95? 


Solution 


Let N denote the number of residents of New York City. To answer the preceding 
question, we must first understand that a random sample of size n is a sample 


N 
such that the n people were chosen in such a manner that each of the ( 
n 


subsets of n people had the same chance of being the chosen subset. 
Consequently, S,,, the number of people in the sample who are in favor of the 
smoking prohibition, is a hypergeometric random variable. That is, S,, has the 
same distribution as the number of white balls obtained when n balls are chosen 
from an urn of N balls, of which .52N are white. But because N and .52N are both 
large in comparison with the sample size n, it follows from the binomial 
approximation to the hypergeometric (see Section 4.8.3 __) that the distribution 
of S,, is closely approximated by a binomial distribution with parameters n and 

p = .52. The normal approximation to the binomial distribution then shows that 


337 of 848 


PIS, > 5 = S, —.52n e 5n— .52n 
n> Sn} = Pi cag Jas) 
Sp — -52n 
~ "1 Tnc52)(48) ~ 04 
~ ® (.04Vn) 


Thus, 


® (1328) =.5528, ifn=11 
P{S, >.5n} x 4 ® (.4020) = .6562,  ifn=101 
@ (1.2665) = .8973, ifn=1001 


In order for this probability to be at least .95, we would need © (.04Vn) > .95. 
Because ® (x) is an increasing function and ® (1.645) = .95, this means that 


04Vn > 1.645 


or 


n = 1691.266 


That is, the sample size would have to be at least 1692. 


Historical notes concerning the normal distribution 


The normal distribution was introduced by the French mathematician Abraham 
DeMoivre in 1733. DeMoivre, who used this distribution to approximate 
probabilities connected with coin tossing, called it the exponential bell-shaped 
curve. Its usefulness, however, became truly apparent only in 1809, when the 
famous German mathematician Karl Friedrich Gauss used it as an integral part 
of his approach to predicting the location of astronomical entities. As a result, it 
became common after this time to call it the Gaussian distribution. 

During the mid- to late 19th century, however, most statisticians started to 
believe that the majority of data sets would have histograms conforming to the 
Gaussian bell-shaped form. Indeed, it came to be accepted that it was “normal” 
for any well-behaved data set to follow this curve. As a result, following the lead 
of the British statistician Karl Pearson, people began referring to the Gaussian 
curve by calling it simply the normal curve. (A partial explanation as to why so 
many data sets conform to the normal curve is provided by the central limit 
theorem, which is presented in Chapter 8 __ .) 


Abraham DeMoivre (1667-1754) 


338 of 848 


Today there is no shortage of statistical consultants, many of whom ply their 
trade in the most elegant of settings. However, the first of their breed worked, in 
the early years of the 18th century, out of a dark, grubby betting shop in Long 
Acres, London, known as Slaughter’s Coffee House. He was Abraham 
DeMoivre, a Protestant refugee from Catholic France, and, for a price, he would 
compute the probability of gambling bets in all types of games of chance. 


Although DeMoivre, the discoverer of the normal curve, made his living at the 
coffee shop, he was a mathematician of recognized abilities. Indeed, he was a 
member of the Royal Society and was reported to be an intimate of Isaac 
Newton. 


Listen to Karl Pearson imagining DeMoivre at work at Slaughter’s Coffee 
House: “/ picture DeMoivre working at a dirty table in the coffee house with a 
broken-down gambler beside him and Isaac Newton walking through the crowd 
to his corner to fetch out his friend. It would make a great picture for an inspired 
artist.” 


Karl Friedrich Gauss 


Karl Friedrich Gauss (1777-1855), one of the earliest users of the normal 
curve, was one of the greatest mathematicians of all time. Listen to the words 
of the well-known mathematical historian E. T. Bell, as expressed in his 1954 
book Men of Mathematics: In a chapter entitled “The Prince of Mathematicians,” 
he writes, “Archimedes, Newton, and Gauss; these three are in a class by 
themselves among the great mathematicians, and it is not for ordinary mortals 
to attempt to rank them in order of merit. All three started tidal waves in both 
pure and applied mathematics. Archimedes esteemed his pure mathematics 
more highly than its applications; 


Newton appears to have found the chief justification for his mathematical 
inventions in the scientific uses to which he put them; while Gauss declared it 
was all one to him whether he worked on the pure or on the applied side.” 


5.0 Exponential Random Variables 


A continuous random variable whose probability density function is given, for some 
A> 0, by 


dAe** ifx>0 


ro =f? ifx<0 


is said to be an exponential random variable (or, more simply, is said to be 
exponentially distributed) with parameter 2. The cumulative distribution function F(a) 
of an exponential random variable is given by 


F(a), =PpC= a} 


[Aen dx 


—e 4x" 
0 


=1-e 7 a>0 


Note that F(0o) = | Ae~** dx = 1, as, of course, it must. The parameter A will now 
0 
be shown to equal the reciprocal of the expected value. 


Example 5a 

Let X be an exponential random variable with parameter A. Calculate (a) E[X] 
and (b) Var(x). 

Solution 


(a) Since the density function is given by 


de ** x>0 


ro x<0 


we obtain, for n > 0, 
E[X"] = | x"Ae * dx 
0 


Integrating by parts (with Ae~** = dv and u = x”) yields 


(oe) 


0 


=0+ eine: ages dx 


E[X™] = —x"e"* | $f, nx) dx 


aaelt n-1 
= Pex") 


Letting n = 1 and then n = 2 gives 
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(b) Hence, 


Thus, the mean of the exponential is the reciprocal of its parameter A, and the 
variance is the mean squared. 


In practice, the exponential distribution often arises as the distribution of the amount 
of time until some specific event occurs. For instance, the amount of time (starting 
from now) until an earthquake occurs, or until a new war breaks out, or until a 
telephone call you receive turns out to be a wrong number are all random variables 
that tend in practice to have exponential distributions. (For a theoretical explanation 
of this phenomenon, see Section 4.7 _ .) 


Example 5b 

Suppose that the length of a phone call in minutes is an exponential random 
variable with parameter A = - If someone arrives immediately ahead of you at 
a public telephone booth, find the probability that you will have to wait 


a. more than 10 minutes; 
b. between 10 and 20 minutes. 


Solution 


Let X denote the length of the call made by the person in the booth. Then the 
desired probabilities are 


P{X > 10} = 1-F(10) 
a. 
=¢ *» 368 
b P{10 < X < 20} = F(20) — F(10) 


=e 1-e 2? = 233 
We say that a nonnegative random variable X is memoryless if 


(5.1) 
P{xX>s+t|X>t}=P{X>s} foralls,t = 0 


If we think of X as being the lifetime of some instrument, Equation (5.1) — states 
that the probability that the instrument survives for at least s + t hours, given that it 
has survived t hours, is the same as the initial probability that it survives for at least s 


hours. In other words, if the instrument is alive at age t, the distribution of the 
remaining amount of time that it survives is the same as the original lifetime 
distribution. (That is, it is as if the instrument does not “remember” that it has already 
been in use for a time t.) 


Equation (5.1) is equivalent to 


P{X>st+tX>t} 


05rd 


P{X > 5 +t} = P{X > s}P{X > t} 


Since Equation (5.2) is satisfied when X is exponentially distributed (for 
e 46+9 = e~45e~4¢) it follows that exponentially distributed random variables are 
memoryless. 


Example 5c 


Consider a post office that is staffed by two clerks. Suppose that when Mr. Smith 
enters the system, he discovers that Ms. Jones is being served by one of the 
clerks and Mr. Brown by the other. Suppose also that Mr. Smith is told that his 
service will begin as soon as either Ms. Jones or Mr. Brown leaves. If the amount 
of time that a clerk spends with a customer is exponentially distributed with 
parameter A, what is the probability that of the three customers, Mr. Smith is the 
last to leave the post office? 


Solution 


The answer is obtained by reasoning as follows: Consider the time at which Mr. 
Smith first finds a free clerk. At this point, either Ms. Jones or Mr. Brown would 
have just left, and the other one would still be in service. However, because the 
exponential is memoryless, it follows that the additional amount of time that this 
other person (either Ms. Jones or Mr. Brown) would still have to spend in the post 
office is exponentially distributed with parameter 2. That is, it is the same as if 
service for that person were just starting at this point. Hence, by symmetry, the 


1 
probability that the remaining person finishes before Smith leaves must equal 5° 
It turns out that not only is the exponential distribution memoryless, but it is also the 


unique distribution possessing this property. To see this, suppose that X is 
memoryless and let F(x) = P{X > x}. Then, by Equation (5.2), 
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F(s + t) = F(s)F(t) 


That is, F(- ) satisfies the functional equation 
g(s+t)=g(s)g) 


However, it turns out that the only right continuous solution of this functional equation 


ist 
tTOne can prove Equation (5.3) —_ as follows: If g(s + t) = g(s)g(t), 
then 
2) (442) A) 
9 n ~ 9 n n ~ 9 n 
and repeating this yields g(m/n) = g™(1/n). Also, 
ee oe ee) ee 1) = caegyyt/n 
9g) =g\ > +7 A)=9 |e) OF We) = GQ) 
Hence, g(m/n) = (g(1))"/”, which, since g is right continuous, 
2 
1 
implies that g(x) = (g(1))*. Because g(1) = (s(5)) > 0, we obtain 
g(x) =e **, where A = — log(g(1)). 
(5.3) 


ga)=e* 
and, since a distribution function is always right continuous, we must have 


F(x) —e 4x or F(x) — P{X < x} —1-—e 4% 


which shows that X is exponentially distributed. 


Example 5d 


Suppose that the number of miles that a car can run before its battery wears out 
is exponentially distributed with an average value of 10,000 miles. If a person 
desires to take a 5000-mile trip, what is the probability that he or she will be able 
to complete the trip without having to replace the car battery? What can be said 
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when the distribution is not exponential? 


Solution 


It follows by the memoryless property of the exponential distribution that the 
remaining lifetime (in thousands of miles) of the battery is exponential with 


1 
parameter A = To Hence, the desired probability is 
P{remaining lifetime > 5} = 1—F(5) =e “*=e 1/2 = 607 


However, if the lifetime distribution F is not exponential, then the relevant 
probability is 


P{lifetime > t + 5| lifetime > t} = ame a 

1 — F(t) 
where t is the number of miles that the battery had been in use prior to the start 
of the trip. Therefore, if the distribution is not exponential, additional information 
is needed (namely, the value of t) before the desired probability can be 
calculated. 


A variation of the exponential distribution is the distribution of a random variable that 
is equally likely to be either positive or negative and whose absolute value is 
exponentially distributed with parameter 4,4 > 0. Such a random variable is said to 
have a Laplace distribution,* and its density is given by 


¢ It also is sometimes called the double exponential random variable. 
Lo enala 
f(x) = ze *1 —o<x<0 


Its distribution function is given by 


> Ae’ dy x<0 


F(x) 


> de” dy + = de~* dy x >0 


= oe 0 


Example 5e 


Consider again Example 4e __, which supposes that a binary message is to be 
transmitted from A to B, with the value 2 being sent when the message is 1 and 
—2 when it is 0. However, suppose now that rather than being a standard normal 
random variable, the channel noise N is a Laplacian random variable with 
parameter A = 1. Suppose again that if R is the value received at location B, then 
the message is decoded as follows: 


If R >.5, then 1 is concluded. 
If R <.5, then O is concluded. 


In this case, where the noise is Laplacian with parameter A = 1, the two types of 
errors will have probabilities given by 


P{error|messagelissent} = P{N < —1.5} 


12 
Eg=1s 
2 


= 1116 
P{error|messageQissent} = P{N > 2.5} 
a2 3=28 
2 


= .041 


On comparing this with the results of Example 4e __, we see that the error 
probabilities are higher when the noise is Laplacian with 2 = 1 than when itis a 
standard normal variable. 


5.5.1 Hazard Rate Functions 
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Consider a positive continuous random variable X that we interpret as being the 
lifetime of some item. Let X have distribution function F and density f. The hazard 
rate (sometimes called the failure rate) function A(t) of F is defined by 


Mt) = a whereF = 1 — F 


To interpret A(t), suppose that the item has survived for a time t and we desire the 
probability that it will not survive for an additional time dt. That is, consider 
P{X € (t,t + dt)|X > t}. Now, 


P{XE(t,t+dt)|X>t} = eens 


_ P{xe(t,t+dt)} 
~ P{x>t} 


~ fo 
~ F(t) dt 


Thus, A(t) represents the conditional probability intensity that a t-unit-old item will 
fail. 


Suppose now that the lifetime distribution is exponential. Then, by the memoryless 
property, it follows that the distribution of remaining life for a t-year-old item is the 
same as that for a new item. Hence, A(t) should be constant. In fact, this checks out, 
since 


Thus, the failure rate function for the exponential distribution is constant. The 
parameter J is often referred to as the rate of the distribution. 


It turns out that the failure rate function A(s), s = 0, uniquely determines the 
distribution function F. To prove this, we integrate A(s) from 0 to t to obtain 


345 of 848 


346 of 848 


As)ds = | S@ds 


— log(1 — F(s))|) 
— log(1 — F(t)) + log(1 — F(0)) 
— log(1 — F(t)) 


where the second equality used that f(s) = <F(s) and the final equality used that 


F(0) = 0. Solving the preceding equation for F(t) gives 


(5.4) 


t 
F(t) = 1— exp -| A(s) ds 
0 


Hence, a distribution function of a positive continuous random variable can be 
specified by giving its hazard rate function. For instance, if a random variable has a 
linear hazard rate function—that is, if 


A(t) =atbt 


then its distribution function is given by 


F(t) —-1- pata bte/2 


and differentiation yields its density, namely, 


f(t) = (at bt)e @ttt?/2) ¢>0 


When a = 0, the preceding equation is known as the Rayleigh density function. 


Example 5f 


One often hears that the death rate of a person who smokes is, at each age, 
twice that of a nonsmoker. What does this mean? Does it mean that a 
nonsmoker has twice the probability of surviving a given number of years as 
does a smoker of the same age? 


Solution 


If A,(t) denotes the hazard rate of a smoker of age t and /,,(t) that of a 
nonsmoker of age t, then the statement at issue is equivalent to the statement 
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that 


As(t) = 2An (6) 


The probability that an A-year-old nonsmoker will survive until age B, A < B, is 


P{A-year-old nonsmoker reaches age B} from (5.4) 
= P{nonsmoker's lifetime > B|nonsmoker's lifetime > A} 
_ 1~ Fron) 
1 — Fron (A) 
expy{—So4n(t) at} 
exp{—Jp An (t) de} 


= exp{— [7 4n(t) dt} 


whereas the corresponding probability for a smoker is, by the same reasoning, 


B 


P{A-year-old smoker reachesageB} = exp{— | A,(t) dt 


A 


= exp{—2{an(t) dt} 


B 2 


=]exp, — | A,(t) dt 


In other words, for two people of the same age, one of whom is a smoker and the 
other a nonsmoker, the probability that the smoker survives to any given age is 
the square (not one-half) of the corresponding probability for a nonsmoker. For 


1 
instance, if 2,(t) = 30° 50 <t < 60, then the probability that a 50-year-old 


nonsmoker reaches age 60 is e 1/3 = .7165, whereas the corresponding 
probability for a smoker is e 2/3 = .5134. 


Equation (5.4) — can be used to show that only exponential random variables are 
memoryless. For if a random variable has a memoryless distribution then the 
remaining life of a s year old must be the same for all s. That is, if X is memoryless, 
then A(s) = c. But, by Equation (5.4) __, this implies that the distribution function of 
X is F(t) = 1—e~“, showing that X is exponential with rate c. 
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5.6 Other Continuous Distributions 


5.6.1 The Gamma Distribution 


A random variable is said to have a gamma distribution with parameters (aA), A > 0, 
a > 0, if its density function is given by 


Ae~4X(ax)*~1 wh 
f@=; To ~ 


0 x<0 


where I'(a), called the gamma function, is defined as 
cr (@)= | e Yy* 1dy 
0 


Integration of I (a) by parts yields 


T (a) + | e %(a-1)y*"* dy 


= e vye% 


0 


(a—1) | e %y* 7 dy 
0 
=(a-1)T (a-1) 
For integral values of a, say, a = n, we obtain, by applying Equation (6.1) 
repeatedly, 


ram) =(Mm-1)FMm-1) 
(n—1)(n—-2)T (n—-2) 


= (n—1)(n—-2):::3-2T (1) 


Since [T (1) = | e ~dx = 1, it follows that, for integral values of n, 
0 


r(n) = (n-1)! 
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When a is a positive integer, say, a = n, the gamma distribution with parameters 

(a, A) often arises, in practice as the distribution of the amount of time one has to 
wait until a total of n events has occurred. More specifically, if events are occurring 
randomly and in accordance with the three axioms of Section 4.7 __, then it turns 
out that the amount of time one has to wait until a total of n events has occurred will 
be a gamma random variable with parameters (n, A). To prove this, let T,, denote the 
time at which the nth event occurs, and note that T,, is less than or equal to t if and 
only if the number of events that have occurred by time t is at least n. That is, with 
N(t) equal to the number of events in [0, t¢], 


P{T, <t} =P{N()=n} 


(ee) 


2 P(N(t) = j} 


— 
Il 
3 


G0 e 4t(at)/ 
re 


IM 


where the final identity follows because the number of events in [0, t] has a Poisson 
distribution with parameter At. Differentiation of the preceding now yields the density 
function of T,,: 


Se tats 4A Re 4*at) 
2» ]! 7 » j! 
j =n J =n 


Sy Qe Atari Bde Atay 
» G—1) =, j! 


j=n jun 


FO 


_ eGo" * 
~  (n-1)! 


Hence, T,, has the gamma distribution with parameters (n,4). (This distribution is 
often referred to in the literature as the n-Erlang distribution.) Note that when n = 1, 
this distribution reduces to the exponential distribution. 


1 
The gamma distribution with 2 = 3 and a =n/2, na positive integer, is called the x 


(read “chi-squared”) distribution with n degrees of freedom. The chi-squared 
distribution often arises in practice as the distribution of the error involved in 
attempting to hit a target in n-dimensional space when each coordinate error is 
normally distributed. This distribution will be studied in Chapter 6 __, where its 
relation to the normal distribution is detailed. 


Example 6a 
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Let X be agamma random variable with parameters a and 2. Calculate (a) E[X] 


and (b) Var(X). 


Solution 
E[X] = ral Oe dx 
1 oO = 
— ese Ae ARCAx)* dx 
a. 
_ T(a@ar1) 
AF (a) 


= : by Equation (6.1) 


b. By first calculating E[X7], we can show that 


Var(X) = — 
ar — ii 
47 


The details are left as an exercise. 


5.6.2 The Weibull Distribution 


The Weibull distribution is widely used in engineering practice due to its versatility. It 


was originally proposed for the interpretation of fatigue data, but now its use has 


been extended to many other engineering problems. In particular, it is widely used in 


the field of life phenomena as the distribution of the lifetime of some object, 
especially when the “weakest link” model is appropriate for the object. That is, 
consider an object consisting of many parts, and suppose that the object 
experiences death (failure) when any of its parts fails. It has been shown (both 
theoretically and empirically) that under these conditions, a Weibull distribution 
provides a close approximation to the distribution of the lifetime of the item. 


The Weibull distribution function has the form 


(6.2) 


0 x<v 


i. 1- exp{-()"] x>v 


A random variable whose cumulative distribution function is given by Equation 
(6.2) is said to be a Weibull random variable with parameters v,a, and f. 
Differentiation yields the density: 


f(x) = eg (2) ‘ex|- (=) Lv 


5.6.3 The Cauchy Distribution 


A random variable is said to have a Cauchy distribution with parameter 0, 
—0o <@< 0, if its density is given by 


1 1 


> aoa OAR. —-o<x<o 
™1+(x—6) 


f(x) = 


Example 6b 


Suppose that a narrow-beam flashlight is soun around its center, which is located 
a unit distance from the x-axis. (See figure 5.7 _.) Consider the point X at which 
the beam intersects the x-axis when the flashlight has stopped spinning. (If the 
beam is not pointing toward the x-axis, repeat the experiment.) 


Figure 5.7 


0 X X-axis 


As indicated in Figure 5.7 _, the point X is determined by the angle 6 between 
the flashlight and the y-axis, which, from the physical situation, appears to be 
uniformly distributed between —12/2 and 7/2. The distribution function of X is 
thus given by 


F@) =?P{X =x} 
= P{tané < x} 
= P{@ < tan" +x} 
= ; + “tan 1x 


where the last equality follows since 6, being uniform over ( — 2/2,7m/2), has 
distribution 
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pra < 10!) 228 2 ae 
= 1 2° 2 aes) 
Hence, the density function of X is given by 
F —— <x< 
=— = — 0 co 
I) = oe PN = aa) . 


and we see that X has the Cauchy distribution.* 


+ That =(tan™*x) = 1/(1 +x?) can be seen as follows: If 


y = tan ‘x, then tany = x, so 


(ep a ee 
Ge) ay dae ay cosy) de COS’ y dx 


d (222) dy _ one ed dy 
dy 


or 


dy COS’y = 1 1 
dx sin*y+cos?y tan*y+1 x?+1 


5.6.4 The Beta Distribution 


A random variable is said to have a beta distribution if its density is given by 


1 
fx) = 4 BH) 
0 otherwise 


x2-1(4— x)?" O<x<l 


where 
1 
Beat) = | xa 
0 


The beta distribution can be used to model a random phenomenon whose set of 
possible values is some finite interval [c, d]—which, by letting c denote the origin and 
taking d — c as a unit measurement, can be transformed into the interval [0, 1]. 


1 
When a = BD, the beta density is symmetric about > giving more and more weight to 
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1 
regions about 5 as the common value a increases. When a = b = 1, the beta 


distribution reduces to the uniform (0,1) distribution. (See Figure 5.8 =.) When 
b >a, the density is skewed to the left (in the sense that smaller values become 
more likely), and it is skewed to the right when a > b. (See Figure 5.9 __.) 


Figure 5.8 Beta densities with parameters (a, b) when a = b. 


I(x) 


Figure 5.9 Beta densities with parameters (a, b) when a/(a +b) = 1/20. 
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f(x) 


The relationship 


(6.3) 


l (a) F (b) 
lr (a+b) 


B(a,b) = 
can be shown to exist between 


1 
neat) = | aaa) a 
0 


and the gamma function. 


Using Equation (6.3) — along with the identity [ (x + 1) = xT (x), which was given 
in Equation (6.1) __ it follows that 


Biat+1,b) V(a+1)T(b) (a+b) a 
B(ia,b) V(at+b+1) T(@I(b) at+b 


The preceding enables us to easily derive the mean and variance of a beta random 
variable with parameters a and b. For if X is such a random variable, then 


_ Batti, b) 
~ B(a, b) 


a 
a+b 


Similarly, it follows that 


E[xX?] = an —x)?~1 dx 


B(a+2,b) 
B(a,b) 


B(at+2,b) B(at+1,b) 
B(at+1,b) B(a,b) 


_ (at+1)a 
~ (a+b+1)(atb) 


The identity Var(X) = E[X”] — (E[X])? now yields 


a(at+1) 


a 2 
VAC), = Tenaered aan 


a ab 
~ (at+b)2(at+b+1) 


Remark A verification of Equation (6.3) | appearsin Example 7c — of Chapter 
6 


5.6.5 The Pareto Distribution 


If X is an exponential random variable with rate 2 and a > 0, then 
Y = ae 
is said to be a Pareto random variable with parameters a and 2. The parameter 


A> 0 is called the index parameter, and a is called the minimum parameter (because 
P{Y > a} = 1). The distribution function of Y is derived as follows: For y > a, 
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P(Y>y) =P(ae*>y) 
= P(e* > y/a) 


P(X > log(y/a)) 
= e log(y/a) 


— p-log((y/a)*) 


=(ajy)? 


Hence, the distribution function of Y is 


Fy(y)=1-P(Y>y)=1-a4y4, yea 


Differentiating the distribution function yields the density function of Y: 
f,O) =daty FD, yea 


When A < 1 it is easily checked that E[Y] = 00. When A > 1, 


E[Y?] will be finite only when A > 2. In this case, 


Bly?) sal aot “ay 


7100 
= atv : 
oa | 


Hence, when A > 2 


var(¥) Aa? 24q2 Aa? 
ar =O TD FE 
A-2 (a-1)* (A-2)(A-1)’ 


Remarks (a) We could also have derived the moments of Y by using the 
representation Y = ae*, where X is exponential with rate 2 . This yields, for 2 > n, 


356 of 848 


n 


a 
E\Y"|=a"kle™|= | ee *ax= af et de = = 
0 0 


(b) Where the density function f(y) of the Pareto is positive (that is, when y > a) it is 
a constant times a power of y, and for this reason it is called a power law density. 


(c) The Pareto distribution has been found to be useful in applications relating to 
such things as 


i. the income or wealth of members of a population; 
ii. the file size of internet traffic (under the TCP protocol); 
iii. the time to compete a job assigned to a supercomputer; 
iv. the size of a meteorite; 
v. the yearly maximum one day rainfalls in different regions. 


Further properties of the Pareto distribution will be developed in later chapters. 


5.7 The Distribution of a Function of a 
Random Variable 


Often, we know the probability distribution of a random variable and are interested in 
determining the distribution of some function of it. For instance, suppose that we 
know the distribution of X and want to find the distribution of g(X). To do so, it is 
necessary to express the event that g(X) < y in terms of X being in some set. We 
illustrate with the following examples. 


Example 7a 


Let X be uniformly distributed over (0, 1). We obtain the distribution of the 
random variable Y, defined by Y = X", as follows: For 0 < y < 1, 


Fy(y) =P{Y<y} 


For instance, the density function of Y is given by 
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1 


1/n-1 
froyatn SSPE" 
0 


otherwise 


Example 7b 


If X is a continuous random variable with probability density f,, then the 
distribution of Y = X” is obtained as follows: For y > 0, 


Fy(y) =P{Y sy} 
= P{x? < y} 
= P{- ys X< Jy) 
= Fy(/y) =F 4/9) 


Differentiation yields 
1 
f,O)= 2 glx) + f,(- Vy) 


Example 7c 


If X has a probability density f,, then Y = |X| has a density function that is 
obtained as follows: For y = 0, 


Fy(y) =P{Y <y} 
= P{|X| sy} 
=P{-y<X<y} 
= Fy(y) — Fx(-y) 


Hence, on differentiation, we obtain 


fy) =feO) + F-y) 3 =y 20 


The method employed in Examples 7a through 7c can be used to prove 
Theorem 7.1 


Theorem 7.1 


Let X be a continuous random variable having probability density function f,. 
Suppose that g(x) is a strictly monotonic (increasing or decreasing), 
differentiable (and thus continuous) function of x. Then the random variable Y 
defined by Y = g(X) has a probability density function given by 
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Gre fg *O)] <9 | if y = g(x) for some x 
(y) = 


0 ify # g(x) forallx 


where g  *(y) is defined to equal that value of x such that g(x) = y. 
We shall prove Theorem 7.1. — when g(x) is an increasing function. 


Proof Suppose that y = g(x) for some x. Then, with Y = g(X), 


Fy(y) =P{g(X) <y} 
=P{x<g *(y)} 
= Fy(g"*(y)) 


which agrees with Theorem 7.1 __, since g~ ‘(y) is nondecreasing, so its derivative 
is nonnegative. 


When y # g(x) for any x, then Fy(y) is either 0 or 1, and in either case f(y) = 0. 


Example 7d 


Let X be a continuous nonnegative random variable with density function f, and 
let Y = X". Find f,, the probability density function of Y. 


Solution 


If g(x) =x", then 
go)=y" 
and 
d if 
—fq1 — _y1/n-1 
dy 4 =v 
Hence, from Theorem 7.1 _ , we obtain, for y = 0, 


1 
(=o Te) 


For n = 2, this gives 


1 
i= zh 


which (since X = 0) is in agreement with the result of Example 7b 


Example 7e The Lognormal Distribution 


If X is a normal random variable with mean py and variance o?, then the random 
variable 


Y =e* 


is said to be a lognormal random variable with parameters w and o?. Thus, a 
random variable Y is lognormal if log(Y) is a normal random variable. The 
lognormal is often used as the distribution of the ratio of the price of a security at 
the end of one day to its price at the end of the prior day. That is, if S,, is the price 
of some security at the end of day n, then it is often supposed that on 


isa 
Sn-1 


lognormal random variable, implying that X = log( <=" ) is normal. Thus, to 
n-1 
assume that is lognormal is to assume that 
n-1 


5 =S,-q0" 
where X is normal. 


Let us now use Theorem 7.1 __ to derive the density of a lognormal random 
variable Y with parameters and a7. Because Y = e*, where X is normal with 
mean yp and variance oa”, we need to determine the inverse of the function 
g(x) = e*. Because 


y=9(9"10)) =e9 © 
we obtain upon taking logarithms that 
g*(y) = log(y) 


Using that 59°) =1/y, Theorem 7.1 _ yields the density: 


1 
Fy) = Fass ml — Moe) — w)"/207}, y > 0 


Summary 
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A random variable X is continuous if there is a nonnegative function f, called the 
probability density function of X, such that, for any set B, 


P{X € B} = [re dx 
B 
If X is continuous, then its distribution function F will be differentiable and 


ae _ 
7 P@) = F@) 


The expected value of a continuous random variable X is defined by 
E[X] = [ xf (x) dx 
A useful identity is that for any function g, 
E|g(X)] = [ g(x) f(x) dx 


As in the case of a discrete random variable, the variance of X is defined by 


Var(X) = E[(X — E[X])?| 


A random variable X is said to be uniform over the interval (a, b) if its probability 
density function is given by 


1 


= <x<b 
f@j=\P 
0 otherwise 
Its expected value and variance are 
(b- a)” 


a+b 
E[X] = =a Var(X) = 1D 


A random variable X is said to be normal with parameters y and o? if its probability 
density function is given by 


1 
1m 


V20 e W207 _ gg cx cw 
oO 


f(x) = 


361 of 848 


362 of 848 


It can be shown that 
“= E[X] o? = Var(X) 
If X is normal with mean pw and variance a2, then Z, defined by 


a 
Z= 2 


oO 


is normal with mean 0 and variance 1. Such a random variable is said to be a 
standard normal random variable. Probabilities about X can be expressed in terms of 
probabilities about the standard normal variable Z, whose probability distribution 
function can be obtained either from Table 5.1 __, the normal calculator on 
StatCrunch, or a website. 


When n is large, the probability distribution function of a binomial random variable 
with parameters n and p can be approximated by that of a normal random variable 
having mean np and variance np(1 — p). 


A random variable whose probability density function is of the form 


Ae ™ x>0 
f(x) = ; 
0 otherwise 
is said to be an exponential random variable with parameter /. Its expected value 
and variance are, respectively, 


1 1 
E[X] = 7 Var(X) = 2 
A key property possessed only by exponential random variables is that they are 
memoryless, in the sense that, for positive s and t, 


P{X>s+t|X>t}=P{X>s} 


If X represents the life of an item, then the memoryless property states that for any t, 
the remaining life of a t-year-old item has the same probability distribution as the life 
of a new item. Thus, one need not remember the age of an item to know its 
distribution of remaining life. 


Let X be a nonnegative continuous random variable with distribution function F and 
density function f. The function 


FO 


A(t) = 1—-F® 2 
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is called the hazard rate, or failure rate, function of F. If we interpret X as being the 
life of an item, then for small values of dt, A(t) dt is approximately the probability that 
a t-unit-old item will fail within an additional time dt. If F is the exponential distribution 
with parameter A, then 


A(t) =A tz=0 


In addition, the exponential is the unique distribution having a constant failure rate. 


A random variable is said to have a gamma distribution with parameters a@ and 2 if its 
probability density function is equal to 


de~**(Ax)** 


T (@) ae 


f(x) = 


and is 0 otherwise. The quantity I (a) is called the gamma function and is defined 


by 
T (a) -| e *x* 1 dx 
0 


The expected value and variance of a gamma random variable are, respectively, 
a a 
E[X] = 7 Var(X) = z 


A random variable is said to have a beta distribution with parameters (a, b) if its 
probability density function is equal to 


f® = gap xt-1q—x)?"? O<x<1 


and is equal to 0 otherwise. The constant B(a, b) is given by 
1 
B(a,b) = | x2-1(4 — x)?" 1 dx 
0 


The mean and variance of such a random variable are, respectively, 


ab 


a 
E[X] = — Var(X) = (a+b)"(a+b+1) 


a+b 
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Problems 


5.1. Let X be a random variable with probability density function 
c(l-x?) -1<x<1 
f= : 


otherwise 


a. What is the value of c? 
b. What is the cumulative distribution function of X? 


5.2. A system consisting of one original unit plus a spare can function for a 
random amount of time X. If the density of X is given (in units of months) by 


Cxe~*/2 x >0 
fay * 
0 x<0 


what is the probability that the system functions for at least 5 months? 
5.3. Consider the function 

5 

C(2x-x3) 0<x<= 

f(~) = 2 


0 otherwise 


Could f be a probability density function? If so, determine C. Repeat if f(x) 
were given by 

5 

C(2x—x*) 0<x<-= 

f() = 2 


0 otherwise 


5.4. The probability density function of X, the lifetime of a certain type of 


electronic device (measured in hours), is given by 


2. S45 ( 


fa=4” 
0 x<10 


a. Find P{X > 20}. 

b. What is the cumulative distribution function of X? 

c. What is the probability that of 6 such types of devices, at least 3 will 
function for at least 15 hours? What assumptions are you making? 


5.5. A filling station is supplied with gasoline once a week. If its weekly volume 
of sales in thousands of gallons is a random variable with probability density 
function 
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5(1—-x)* O0<x<1 
[= ; 


otherwise 


what must the capacity of the tank be so that the probability of the supply 
being exhausted in a given week is .01? 
5.6. Compute E[X] if X has a density function given by 


tye -*/2 x >0 
a. f(x) = 34% ; 
0 otherwise 
c(l-x?) -1<x<1 
b. f(x) = 3 
0 otherwise 
= x>5 
Cc. (x) =+* . 
0 «<5 


5.7. The density function of X is given by 
a+bx? 0<x<1 
x)= 
f@) (3 otherwise 


3 
If E[X] = 3 find a and b. 


5.8. The lifetime in hours of an electronic tube is a random variable having a 


probability density function given by 
f(x) =xe™~ x>0 


Compute the expected lifetime of such a tube. 

5.9. Consider Example 4b of Chapter4 _ , but now suppose that the 
seasonal demand is a continuous random variable having probability density 
function f . Show that the optimal amount to stock is the value s* that satisfies 


b 
S20" Fae 


where b is net profit per unit sale, @ is the net loss per unit unsold, and F is the 
cumulative distribution function of the seasonal demand. 

5.10. Trains headed for destination A arrive at the train station at 15-minute 
intervals starting at 7 A.M., whereas trains headed for destination B arrive at 
15-minute intervals starting at 7:05 A.M. 

a. If a certain passenger arrives at the station at a time uniformly 
distributed between 7 and 8 A.M. and then gets on the first train that 
arrives, what proportion of time does he or she go to destination A? 

b. What if the passenger arrives at a time uniformly distributed between 
7:10 and 8:10 A.M.? 
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5.11. A point is chosen at random on a line segment of length L. Interpret this 
statement, and find the probability that the ratio of the shorter to the longer 


1 
segment is less than z 


5.12. A bus travels between the two cities A and B, which are 100 miles apart. 
If the bus has a breakdown, the distance from the breakdown to city A has a 
uniform distribution over (0, 100). There is a bus service station in city A, in B, 
and in the center of the route between A and B. It is suggested that it would 
be more efficient to have the three stations located 25, 50, and 75 miles, 
respectively, from A. Do you agree? Why? 
5.13. You arrive at a bus stop at 10 A.M., knowing that the bus will arrive at 
some time uniformly distributed between 10 and 10:30. 
a. What is the probability that you will have to wait longer than 10 
minutes? 
b. If, at 10:15, the bus has not yet arrived, what is the probability that you 
will have to wait at least an additional 10 minutes? 


5.14. Let X be a uniform (0, 1) random variable. Compute E[X”] by using 
Proposition 2.1, and then check the result by using the definition of 
expectation. 
5.15. lf X is a normal random variable with parameters u = 10 and a? = 36, 
compute 

a. P{X > 5}; 

b. P{4 < X < 16}; 

c. P{X < 8}; 

d. P{X < 20}; 

e. P{X > 16}. 


5.16. The annual rainfall (in inches) in a certain region is normally distributed 
with un = 40 and o = 4. What is the probability that starting with this year, it will 
take more than 10 years before a year occurs having a rainfall of more than 
50 inches? What assumptions are you making? 
5.17. The salaries of physicians in a certain speciality are approximately 
normally distributed. If 25 percent of these physicians earn less than $180,000 
and 25 percent earn more than $320,000, approximately what fraction earn 

a. less than $200,000? 

b. between $280,000 and $320,000? 


5.18. Suppose that X is a normal random variable with mean 5. If 

P{X > 9} =.2, approximately what is Var(X)? 

5.19. Let X be a normal random variable with mean 12 and variance 4. Find 
the value of c such that P{X > c} = .10. 
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5.20. If 65 percent of the population of a large community is in favor of a 
proposed rise in school taxes, approximate the probability that a random 
sample of 100 people will contain 

a. at least 50 who are in favor of the proposition; 

b. between 60 and 70 inclusive who are in favor; 

c. fewer than 75 in favor. 


5.21. Suppose that the height, in inches, of a 25-year-old man is a normal 
random variable with parameters w = 71 and o? = 6.25. What percentage of 
25-year-old men are taller than 6 feet, 2 inches? What percentage of men in 
the 6-footer club are taller than 6 feet, 5 inches? 

5.22. Every day Jo practices her tennis serve by continually serving until she 
has had a total of 50 successful serves. If each of her serves is, independently 
of previous ones, successful with probability .4, approximately what is the 
probability that she will need more than 100 serves to accomplish her goal? 
Hint: Imagine even if Jo is successful that she continues to serve until she 
has served exactly 100 times. What must be true about her first 100 serves if 
she is to reach her goal? 

5.23. One thousand independent rolls of a fair die will be made. Compute an 
approximation to the probability that the number 6 will appear between 150 
and 200 times inclusively. If the number 6 appears exactly 200 times, find the 
probability that the number 5 will appear less than 150 times. 

5.24. The lifetimes of interactive computer chips produced by a certain 
semiconductor manufacturer are normally distributed with parameters 

y= 1.4 x 10° hours and o = 3 x 10° hours. What is the approximate 
probability that a batch of 100 chips will contain at least 20 whose lifetimes are 
less than 1.8 x 10°? 

5.25. Each item produced by a certain manufacturer is, independently, of 
acceptable quality with probability .95. Approximate the probability that at 
most 10 of the next 150 items produced are unacceptable. 

5.26. Two types of coins are produced at a factory: a fair coin and a biased 
one that comes up heads 55 percent of the time. We have one of these coins 
but do not know whether it is a fair coin or a biased one. In order to ascertain 
which type of coin we have, we shall perform the following statistical test: We 
shall toss the coin 1000 times. If the coin lands on heads 525 or more times, 
then we shall conclude that it is a biased coin, whereas if it lands on heads 
fewer than 525 times, then we shall conclude that it is a fair coin. If the coin is 
actually fair, what is the probability that we shall reach a false conclusion? 
What would it be if the coin were biased? 

5.27. In 10,000 independent tosses of a coin, the coin landed on heads 5800 
times. Is it reasonable to assume that the coin is not fair? Explain. 
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5.28. Twelve percent of the population is left handed. Approximate the 
probability that there are at least 20 left-handers in a school of 200 students. 
State your assumptions. 

5.29. A model for the movement of a stock supposes that if the present price 
of the stock is s, then after one period, it will be either us with probability p or 
ds with probability 1 — p. Assuming that successive movements are 
independent, approximate the probability that the stock’s price will be up at 
least 30 percent after the next 1000 periods if u = 1.012,d = 0.990, and 

p = .52. 

5.30. An image is partitioned into two regions, one white and the other black. 
A reading taken from a randomly chosen point in the white section will be 
normally distributed with » = 4 and a? = 4, whereas one taken from a 
randomly chosen point in the black region will have a normally distributed 
reading with parameters (6, 9). A point is randomly chosen on the image and 
has a reading of 5. If the fraction of the image that is black is a, for what value 
of a would the probability of making an error be the same, regardless of 
whether one concluded that the point was in the black region or in the white 
region? 

5:31. 

a. A fire station is to be located along a road of length A,A < o». If fires 
occur at points uniformly chosen on (0, A), where should the station be 
located so as to minimize the expected distance from the fire? That is, 
choose a so as to 

minimize E[|X —a| | 


when X is uniformly distributed over (0, A). 

b. Now suppose that the road is of infinite length—stretching from point 0 
outward to oo. If the distance of a fire from point 0 is exponentially 
distributed with rate 2, where should the fire station now be located? 
That is, we want to minimize E[|X — a 


], where X is now exponential 
with rate 1. 


5.32. The time (in hours) required to repair a machine is an exponentially 
1 
distributed random variable with parameter 2 = 5 What is 
a. What is the probability that a repair time exceeds 2 hours? 


b. the conditional probability that a repair takes at least 10 hours, given 
that its duration exceeds 9 hours? 


5.33. If U is uniformly distributed on (0, 1), find the distribution of 
Y= —log(U). 
5.34. Jones figures that the total number of thousands of miles that a racing 
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auto can be driven before it would need to be junked is an exponential 
1 
random variable with parameter 50° Smith has a used car that he claims has 


been driven only 10,000 miles. If Jones purchases the car, what is the 
probability that she would get at least 20,000 additional miles out of it? Repeat 
under the assumption that the lifetime mileage of the car is not exponentially 
distributed, but rather is (in thousands of miles) uniformly distributed over (0, 
40). 
5.35. If X is an exponential random variable with parameter A, and c > 0, find 
the density function of cX. What kind of random variable is cx. 
5.36. The lung cancer hazard rate A(t) of a t-year-old male smoker is such 
that 

A(t) = .027 +.00025(t— 40) t > 40 


Assuming that a 40-year-old male smoker survives all other hazards, what is 
the probability that he survives to (a) age 50 and (b) age 60 without 
contracting lung cancer? 
5.37. Suppose that the life distribution of an item has the hazard rate function 
A(t) = t3,t > 0. What is the probability that 

a. the item survives to age 2? 

b. the item’s lifetime is between .4 and 1.4? 

c. a 1-year-old item will survive to age 2? 


5.38. If X is uniformly distributed over ( — 1,1), find 
1 
(a) P{IX| > 5}; 


(b) the density function of the random variable |X |. 

5.39. If Y is uniformly distributed over (0, 5), what is the probability that the 
roots of the equation 4x? + 4xY + Y + 2 = 0 are both real? 

5.40. If X is an exponential random variable with parameter 2 = 1, compute 
the probability density function of the random variable Y defined by Y = log X. 
5.41. If X is uniformly distributed over (a,b), find a and b if E[X] = 10, 

Var(X) = 48. 

5.42. If X is uniformly distributed over (0, 1), find the density function of Y = e%. 
5.43. Find the distribution of R = Asin 6, where A is a fixed constant and @ is 
uniformly distributed on ( — 2/2,7/2). Such a random variable R arises in the 
theory of ballistics. If a projectile is fired from the origin at an angle a from the 
earth with a speed v, then the point R at which it returns to the earth can be 
expressed as R = (v7/g)sin2a, where g is the gravitational constant, equal to 
980 centimeters per second squared. 

5.44. Let Y be a lognormal random variable (see Example 7e __ for its 
definition) and let c > 0 be aconstant. Answer true or false to the following, 


and then give an explanation for your answer. 
a. cY is lognormal; 
b.c + Y is lognormal. 


Theoretical Exercises 


5.1. The speed of a molecule in a uniform gas at equilibrium is a random 
variable whose probability density function is given by 


f= . 
0 


= 2 
2obx” x >0 


x<0 
where b = m/2kT and k, T, and m denote, respectively, Boltzmann’s 
constant, the absolute temperature of the gas, and the mass of the molecule. 


Evaluate a in terms of b. 
5.2. Show that 


a= | revs say | P{Y < —y}dy 
0 0 


[o) 0 
| Ply < -y} dy -| xf (x) dx 
0 — oo 
| Ply > y} dy = | xf (x) dx 
0 0 


5.3. Show that if X has density function f, then 


Hint: Show that 


E{g(X)] -| g(x)f (x) dx 


Hint: Using Theoretical Exercise 5.2 __, start with 


Elg(X)] -| reat) > dy - | P{g(X) < —y}dy 
0 0 


and then proceed as in the proof given in the text when g(X) = 0. 
5.4. Prove Corollary 2.1 
5.5. Use the result that for a nonnegative random variable Y, 
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E[Y] = | PLY > t}dt 


to show that for a nonnegative random variable x, 


E[X"] = | nx"—1P{X > x}dx 
0 


Hint: Start with 


E[xX"| = | P{X" > thdt 


and make the change of variables t = x”. 
5.6. Define a collection of events E,,0 < a < 1, having the property that 


P(Eq) = 1 for all a but P( mE.) = 0. 
a 


Hint: Let X be uniform over (0, 1) and define each E, in terms of X. 
5.7. The standard deviation of X, denoted SD(X), is given by 


SD(X) = J Var(X) 


Find SD(aX + b) if X has variance o7. 
5.8. Let X be a random variable that takes on values between 0 and c. That 


is, P{O < X <c} = 1. Show that 
2 
c 
Var(X) < — 
ar(x) SZ 


Hint: One approach is to first argue that 
E[X?] < cE[X] 


and then use this inequality to show that 


Var(X) < c? ~ meals 
<c*[a(1—a)] wherea = a 


5.9. Show that Z is a standard normal random variable; then, for x > 0, 
a. P{Z > x} = P{Z < —x}; 
b. P{|Z| > x} = 2P{Z > x}; 
c. P{|Z| < x} =2P{Z<x}-1. 


5.10. Let f(x) denote the probability density function of a normal random 
variable with mean yw and variance o”. Show that  — o and w +o are points of 


inflection of this function. That is, show that fo =0Owhenx=yu-oor 
x=uUt+o. 
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5.11. Let Z be a standard normal random variable Z, and let g be a 
differentiable function with derivative g . 

a. Show that E[g (Z)] = E[Zg(Z)]; 

b. Show that E[Z"**] = nB[Z"~ +). 

c. Find E[Z*]. 


5.12. Use the identity of Theoretical Exercises 5.5 to derive E[X*] when X 
is an exponential random variable with parameter 2. 
5.13. The median of a continuous random variable having distribution function 


1 
F is that value m such that F(m) = 5° That is, a random variable is just as 


likely to be larger than its median as it is to be smaller. Find the median of X if 
X is 

a. uniformly distributed over (a, b); 

b. normal with parameters p, 07; 

c. exponential with rate 7. 


5.14. The mode of a continuous random variable having density f is the value 
of x for which f(x) attains its maximum. Compute the mode of X in cases (a), 
(b), and (c) of Theoretical Exercises 5.13 

5.15. If X is an exponential random variable with parameter A, and c > 0, 
show that cx is exponential with parameter //c. 

5.16. Compute the hazard rate function of X when X is uniformly distributed 
over (0, a). 

5.17. If X has hazard rate function A(t), compute the hazard rate function of 
aX where a is a positive constant. 

5.18. Verify that the gamma density function integrates to 1. 


5.19. If X is an exponential random variable with mean 1/A, show that 


4 
EX"]=—2 k= 1,2,. 


Hint: Make use of the gamma density function to evaluate E[X*]. 
5.20. Verify that 
a 


Var(X) = > 


‘\ 


when X is a gamma random variable with parameters a and A. 


1 
5.21. Show that T (3) = Vn. 


L 

Hint: (5) = | e *x~1/2 dx, Make the change of variables y = V2x and 
0 

then relate the resulting expression to the normal distribution. 
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5.22. Compute the hazard rate function of a gamma random variable with 
parameters (a,A) and show it is increasing when a = 1 and decreasing when 
asl. 

5.23. Compute the hazard rate function of a Weibull random variable and 
show it is increasing when 6 = 1 and decreasing when f < 1. 

5.24. Show that a plot of log(log(1 — F(x)) *) against log x will be a straight 
line with slope B when F(- ) is a Weibull distribution function. Show also that 
approximately 63.2 percent of all observations from such a distribution will be 
less than a. Assume that v = 0. 

5.25. Let 


Show that if X is a Weibull random variable with parameters v, a, and £, then 
Y is an exponential random variable with parameter 2 = 1 and vice versa. 
5.26. Let F be a continuous distribution function. If U is uniformly distributed 
on (0,1), find the distribution function of Y = F-*(U), where F * is the inverse 
function of F. (That is, y = F-*(x) if F(y) = x.) 
5.27. lf X is uniformly distributed over (a, b), what random variable, having a 
linear relation with X, is uniformly distributed over (0, 1)? 
5.28. Consider the beta distribution with parameters (a, b). Show that 
a. when a > 1 and b > 1, the density is unimodal (that is, it has a unique 
mode) with mode equal to (a — 1)/(a+ b— 2); 
b.whena<1,b<1, anda+b < 2, the density is either unimodal with 
mode at 0 or 1 or U-shaped with modes at both 0 and 1; 
c. when a = 1 = b, all points in [0, 1] are modes. 


5.29. Let X be a continuous random variable having cumulative distribution 
function F. Define the random variable Y by Y = F(X). Show that Y is uniformly 
distributed over (0, 1). 
5.30. Let X have probability density f,. Find the probability density function of 
the random variable Y defined by Y = aX + b. 
5.31. Find the probability density function of Y = e* when X is normally 
distributed with parameters w and a2. The random variable Y is said to have a 
lognormal distribution (since log Y has a normal distribution) with parameters 
and o?. 
5.32. Let X and Y be independent random variables that are both equally likely 
to be either 1, 2,...,(10)”, where N is very large. Let D denote the greatest 
common divisor of X and Y, and let Q, = P{D = k}. 

a. Give a heuristic argument that Q,, = a Q,- 


Hint: Note that in order for D to equal k, k must divide both X and Y 


and also X/k, and Y/k must be relatively prime. (That is, X¥/k, and 
Y/k must have a greatest common divisor equal to 1.) 
b. Use part (a) to show that 
Q, =P{XandY¥ are relatively prime} 


1 
[e.e) 
>. 1/k? 
k=1 


It is a well-known identity that yy 1/k? = 17/6, 80Q, =6/n?. (In 
1 
number theory, this is known as the Legendre theorem.) 


c. Now argue that 
aT (Pe =1 
@= I] p? 


t=1 


where P; is the ith-smallest prime greater than 1. 
Hint: X and Y will be relatively prime if they have no common prime 
factors. Hence, from part (b), we see that 


rT Pra1\... 6 
PP) a? 


i=1 


5.33. Prove Theorem 7.1. when g(x) is a decreasing function. 


Self-Test Problems and Exercises 


5.1. The number of minutes of playing time of a certain high school basketball 
player in a randomly chosen game is a random variable whose probability 
density function is given in the following figure: 

.O50 


025 


10 20 30 40) 


Find the probability that the player plays 
a. more than 15 minutes; 
b. between 20 and 35 minutes; 
c. less than 30 minutes; 
d. more than 36 minutes. 
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5.2. For some constant c, the random variable X has the probability density 
function 
cx" O<x<1 
a 

F@) . otherwise 
Find (a) c and (b) P{X > x},0<x<1. 
5.3. For some constant c, the random variable X has the probability density 
function 
cx* 0<x<2 


fe =| 


0 otherwise 


Find (a) E[X] and (b) Var(x). 
5.4. The random variable X has the probability density function 
ax + bx? O0<x<1 
x)= 
I f otherwise 


If E[X] = .6, find (a) P{xX < 3 and (b) Var(x). 


5.5. The random variable X is said to be a discrete uniform random variable 


on the integers 1,2,..., n if 


1 
PK=}=— i= 12.40 


For any nonnegative real number x, let Int(x) (sometimes written as [x]) be 
the largest integer that is less than or equal to x. Show that if U is a uniform 
random variable on (0, 1), then X = Int (nU) + 1 is a discrete uniform random 
variable on 1,...,n. 
5.6. Your company must make a sealed bid for a construction project. If you 
succeed in winning the contract (by having the lowest bid), then you plan to 
pay another firm $100,000 to do the work. If you believe that the minimum bid 
(in thousands of dollars) of the other participating companies can be modeled 
as the value of a random variable that is uniformly distributed on (70, 140), 
how much should you bid to maximize your expected profit? 
5.7. To be a winner in a certain game, you must be successful in three 
successive rounds. The game depends on the value of U, a uniform random 
variable on (0, 1). If U > .1, then you are successful in round 1; if U > .2, then 
you are successful in round 2; and if U > .3, then you are successful in round 
3. 

a. Find the probability that you are successful in round 1. 

b. Find the conditional probability that you are successful in round 2 given 

that you were successful in round 1. 
c. Find the conditional probability that you are successful in round 3 given 
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that you were successful in rounds 1 and 2. 
d. Find the probability that you are a winner. 


5.8. A randomly chosen IQ test taker obtains a score that is approximately a 
normal random variable with mean 100 and standard deviation 15. What is the 
probability that the score of such a person is (a) more than 125; (b) between 
90 and 110? 
5.9. Suppose that the travel time from your home to your office is normally 
distributed with mean 40 minutes and standard deviation 7 minutes. If you 
want to be 95 percent certain that you will not be late for an office appointment 
at 1 P.M., what is the latest time that you should leave home? 
5.10. The life of a certain type of automobile tire is normally distributed with 
mean 34,000 miles and standard deviation 4000 miles. 

a. What is the probability that such a tire lasts more than 40,000 miles? 

b. What is the probability that it lasts between 30,000 and 35,000 miles? 

c. Given that it has survived 30,000 miles, what is the conditional 

probability that the tire survives another 10,000 miles? 


5.11. The annual rainfall in Cleveland, Ohio, is approximately a normal 
random variable with mean 40.2 inches and standard deviation 8.4 inches. 
What is the probability that 
a. next year’s rainfall will exceed 44 inches? 
b. the yearly rainfalls in exactly 3 of the next 7 years will exceed 44 
inches? 


Assume that if A; is the event that the rainfall exceeds 44 inches in year i 
(from now), then the events A;,i => 1, are independent. 

5.12. The following table uses 1992 data concerning the percentages of male 
and female full-time workers whose annual salaries fall into different ranges: 


Earnings range Percentage of female Percentage of males 

<9999 8.6 4.4 
10,000-—19,999 38.0 21-4 
20,000-—24,999 19.4 15.8 
25,000-49,999 29.2 41.5 
>50,000 4.8 17.2 
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Suppose that random samples of 200 male and 200 female full-time workers 
are chosen. Approximate the probability that 
a. at least 70 of the women earn $25,000 or more; 
b. at most 60 percent of the men earn $25,000 or more; 
c. at least three-fourths of the men and at least half the women earn 
$20,000 or more. 


5.13. At a certain bank, the amount of time that a customer spends being 
served by a teller is an exponential random variable with mean 5 minutes. If 
there is a customer in service when you enter the bank, what is the probability 
that he or she will still be with the teller after an additional 4 minutes? 

5.14. Suppose that the cumulative distribution function of the random variable 
X is given by 


2 


F(x)=1-e* x>0 


Evaluate (a) P{X > 2}; (b) P{1 < X < 3}; (c) the hazard rate function of F; (d) E 
[X]; (e) Var(X). 

Hint: For parts (d) and (e), you might want to make use of the results of 
Theoretical Exercise 5.5. 

5.15. The number of years that a washing machine functions is a random 
variable whose hazard rate function is given by 


2 i722 
A(t) =4.24+.3(t-2) 2<t<5 
ia t>5 


a. What is the probability that the machine will still be working 6 years 
after being purchased? 

b. If it is still working 6 years after being purchased, what is the 
conditional probability that it will fail within the next 2 years? 


5.16. A standard Cauchy random variable has density function 


1 


Show that if X is a standard Cauchy random variable, then 1/X is also a 
standard Cauchy random variable. 
5.17. A roulette wheel has 38 slots, numbered 0, 00, and 1 through 36. If you 
bet 1 on a specified number, then you either win 35 if the roulette ball lands on 
that number or lose 1 if it does not. If you continually make such bets, 
approximate the probability that 

a. you are winning after 34 bets; 
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b. you are winning after 1000 bets; 
c. you are winning after 100,000 bets. 


Assume that each roll of the roulette ball is equally likely to land on any of the 
38 numbers. 

5.18. There are two types of batteries in a bin. When in use, type i batteries 
last (in hours) an exponentially distributed time with rate 2;, i = 1, 2.. A battery 


that is randomly chosen from the bin will be a type i battery with probability 
2 


Dip > p, = 1. If a randomly chosen battery is still operating after t hours of 
i=4 
use, what is the probability that it will still be operating after an additional s 
hours? 
5.19. Evidence concerning the guilt or innocence of a defendant in a criminal 
investigation can be summarized by the value of an exponential random 
variable X¥ whose mean pu depends on whether the defendant is guilty. If 
innocent, u = 1; if guilty, « = 2. The deciding judge will rule the defendant 
guilty if X > c for some suitably chosen value of c. 
a. If the judge wants to be 95 percent certain that an innocent man will not 
be convicted, what should be the value of c? 
b. Using the value of c found in part (a), what is the probability that a 
guilty defendant will be convicted? 


5.20. For any real number y, define y* by 


yrs y, ify=0 
0, ify<0 
Let c be a constant. 
a. Show that 
1 
E[Z—0)"]= ae“? cl — © ©) 


when Z is a standard normal random variable. 
b. Find E[(x — c)*] when _X is normal with mean yw and variance o?. 


5.21. With ® (x) being the probability that a normal random variable with 
mean 0 and variance 1 is less than x, which of the following are true: 

a. P(—-x)= P(x) 

b. B(x) + ®(-x)=1 

c. @(-x)=1/ ® (x) 


5.22. Let U be a uniform (0, 1) random variable, and let a < b be constants. 
a. Show that if b > 0, then bU is uniformly distributed on (0,5), and if 


b <0, then bU is uniformly distributed on (b, 0). 
b. Show that a + U is uniformly distributed on (a, 1 + a). 
c. What function of U is uniformly distributed on (a, b)? 
d. Show that min(U, 1 — U) is a uniform (0, 1/2) random variable. 
e. Show that max(U, 1 — U) is a uniform (1/2, 1) random variable. 


5.23. Let 
ifx<0 


fas) 
ei 


f(x) = fOSe<1 


WIP WIP WIR 


eF-D ifx>1 


a. Show that f is a probability density function. (That is, show that 


f(x) = 0, and | f(x) dx = 1.) 


— co 


b. If X has density function f, find E[X]. 


5.24. Let 
2 


i= (1+x)e%, x>0 


1+0 


where 6 > 0. 
a. Show that f(x) is a density function. That is, show that f(x) = 0, and 


vat | f(x) dx = 1. 
0 


b. Find E[X] 
c. Find Var(X). 


Chapter 6 Jointly Distributed Random 
Variables 


Contents 


6.1 Joint Distribution Functions 
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6.3 Sums of Independent Random Variables 
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6.5 Conditional Distributions: Continuous Case 

6.6 Order Statistics 

6.7 Joint Probability Distribution of Functions of Random Variables 


6.8 Exchangeable Random Variables 


6.1 Joint Distribution Functions 


Thus far, we have concerned ourselves only with probability distributions for single 
random variables. However, we are often interested in probability statements 
concerning two or more random variables. In order to deal with such probabilities, we 
define, for any two random variables X and Y, the joint cumulative probability 
distribution function of X and Y by 


F(a,b) =P{X<aY<b} - 0 <ab< o 


All joint probability statements about X and Y can, in theory, be answered in terms of 
their joint distribution function. For instance, 


(1.1) 
P(a, <X <p, by <Y < by) = F(dp, bo) + F(a4,b1) — F(a4, by) — F(a, by) 


whenever a, < az, b; < bz. To verify Equation (1.1) __, note that for a, < ay, 


P(X < ay, ¥Y <b) = P(X S a4, Y <b) + P(a, <X Say,Y <b) 


giving that 


(1.2) 
P(a, < X < a,, Y < b) = F(aQp,b) — F(a,,b) 


Also, because for b, < bz, 


P(a, <X <a>, Y¥ < by) = P(ay < X <a, ¥Y < by) + Play <X < ay, dy <Y < dy) 


we have that when a, < az, by < bz 


P(a,<XSa,,b,<Y¥ Sb.) =P(a,<XSa,,¥ Sb2)— Pla, <XSa,,Y Sj) 
= F(@2,b2) — F(a1,b2) — F(a@2,b1) + F(a1, b1) 


where the final equality used Equation (1.2) 


When X and Y are discrete random variables, with X taking on one of the values x;, 
i> 1, and Y one of the values yple 1, it is convenient to define the joint probability 
mass function of X and Y by 


p(x, y) = P(X =x%,Y=y) 


Using that the event {X = x} is the union of the mutually exclusive events 
{Xx¥=x,Y= y, |. j = 1, it follows that the probability mass function of X can be 


obtained from the joint probability mass function by 
Dy(x) =P(X=x) 
=°( U; {x=2¥=y,) 


= > Pex =x%,Y=y,) 
j 

= > r%y,) 
j 


Similarly, the probability mass function of Y is obtained from 


PyYy) = > Pay) 


Example 1a 


Suppose that 3 balls are randomly selected from an urn containing 3 red, 4 white, 
and 5 blue balls. If we let X and Y denote, respectively, the number of red and 
white balls chosen, then the joint probability mass function of X and 

Y,p(i, 7) = P{X =i,Y = j}, is obtained by noting that X = i, Y = | if, of the 3 balls 
selected, i are red, j are white, and 3 — i— j are blue. Because all subsets of 
size 3 are equally likely to be chosen, it follows that 


isi) 
(3) 


pis) = 
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Consequently, 


0.0) - (2) (22) _ 19 
(0,0) = (3\/("s = a0 
_ (4\(5\ (12) _ 40 
mony ())/(3) = 2a 
0.2) -(4\(9) (22) _ 39 
mo =()s)(a) = za 
_ (4) (12\_ 4 
00,3) =(3\/("s | = a0 
Loy = (2\(2) (22) — 39 
mon (i)e)(a) = za 
Lay _(3\(4\(8) (22) _ £2 
many (N)G)(3) 20 
_ (3\(4) (12) _ 18 
my (i)2)(3) = za 
20) -(3\() (22) _ 25 
r= ()s)(3) = za 
a1 _(3\(4) (12) _ 2 
wenty=()s)(3) = 2 


3 12 1 
wo -()(3)xh 


These probabilities can most easily be expressed in tabular form, as in Table 
6.1. The reader should note that the probability mass function of X is obtained 
by computing the row sums, whereas the probability mass function of Y is 
obtained by computing the column sums. Because the individual probability mass 
functions of X and Y thus appear in the margin of such a table, they are often 
referred to as the marginal probability mass functions of X and Y, respectively. 


Table 6.1 P{X = i,Y = j}. 
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Row sum = P{X = i} 


Column sum = P{Y =j} 


Example 1b 


Suppose that 15 percent of the families in a certain community have no children, 
20 percent have 1 child, 35 percent have 2 children, and 30 percent have 3. 
Suppose further that in each family each child is equally likely (independently) to 
be a boy or a girl. If a family is chosen at random from this community, then B, 
the number of boys, and G, the number of girls, in this family will have the joint 
probability mass function shown in Table 6.2 


Table 6.2 P{B =i,G = j}. 


J 
i 0) 1 2 3 Rowsum = P{B =i} 
0 15 10 .0875 = .0375 3750 
l 10 175.1125 0 3875 
Z 0875 .1125 0 0 2000 
3 .0375 0 0 0 .0375 
Column sum = P{G = j} 3750) =.3875)— 2000-0375 


The probabilities shown in Table 6.2 —_are obtained as follows: 


P{B =0,G=0} = P{nochildren} = .15 
P{B=0,G=1} = P{1girland total of 1 child } 
1 
= P{1child | 1 girl | 1 cil | = c20)(5) 


P{B =0,G=2} = P{2girlsand total of 2 children } 


1\2 
= P{2children }P{2 girls | 2 children } = (35)(5) 


We leave the verification of the remaining probabilities in the table to the reader. 


Example 1c 


Consider independent trials where each trial is a success with probability p. Let 
X, denote the number of trials until there have been r successes, and let Y, 
denote the number of trials until there have been s failures. Suppose we want to 
derive their joint probability mass function P(X, = i,Y, = j). To do so, first 
consider the case i < j. In this case, write 


P(X, =4Y, = j) = P(X, = i)P(Y; = j|X; =H) 
Now, if there have been r successes after trial i then there have been i —r 
failures by that point. Hence, the conditional distribution of Y,, given that X, = i, 


is the distribution of i plus the number of additional trials after trial i until there 
have been an additional s — i+ r failures. Hence, 


P(X, = LY, = jf) = PX, =OPVs-i4r =F -D, U<j 


Because X,. is a negative binomial random variable with parameters (r,p) and 
Y,-j+,r iS a negative binomial random variable with parameters (s —i+r,1-—p), 
the preceding yields 


. Re ci a ae Litt, Lax | 
Poe =4¥.=)=(1_ 4 Jer - py {Gn Cor) a ea 


We leave it as an exercise to determine the analogous expression when j < i. 


We say that X and Y are jointly continuous if there exists a function f(x, y), defined 
for all real x and y, having the property that for every set C of pairs of real numbers 
(that is, C is a set in the two-dimensional plane), 
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(1.3) 


P{(X,Y) € C} = whe _ fy) dx dy 


The function f(x, y) is called the joint probability density function of X and Y. lf A and 
B are any sets of real numbers, then by defining C = {(x, y):x € A,y € B}, we see 
from Equation (1.3) that 


(1.4) 


roceaven)=| | foxes 


BYA 


Because 
F(a,b) = P{X €(—©, a], Y € (— ~,b]} 


=f. So Fay) dx dy 


it follows, upon differentiation, that 
2 


0 
f(a,b) = = F(ab) 


wherever the partial derivatives are defined. Another interpretation of the joint density 
function, obtained from Equation (1.4) __, is 


Pla<X<atdab<¥<bt+db} =f fe Fx, y) dedy 
~ f(a,b) dadb 


when da and db are small and f(x,y) is continuous at a, b. Hence, f(a,b) isa 
measure of how likely it is that the random vector (X, Y) will be near (a, b). 


If X and Y are jointly continuous, they are individually continuous, and their 
probability density functions can be obtained as follows: 


P{XE€ A} =P{XEAYE(—~,0)} 
=f Jo fy) dy dx 


= Jf (x)dx 


where 


A0)=| fone 


is thus the probability density function of X. Similarly, the probability density function 
of Y is given by 


fyor= | sore 
Example 1d 
The joint density function of X and Y is given by 


26%e 7) Dx ee 00, OS y< o 
0 otherwise 


fous) =| 


Compute (a) P{X > 1,Y < 1}, (b) P{X < Y}, and (c) P{X < a}. 


Solution 
1 (oe) 
a. PX >1,Y<1)= | | 2e *e *Y¥dxdy 
01 
Now, 
| e *dk= —e*|> =e" 
1 

giving that 


1 
PAS 1 <1)}= =| 2e dy =e 1(1-e°?) 
0 
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P{xX<Y} = if 2e *e *¥dxdy 
(x,y ):x <y 
= J, Jo2e *e 29 dx dy 
= J, 2e-27(4 —e )dy 
b. co = © = 
=f, 2e dy —J, 2e- dy 
7 83 
_ 1 
3 
P{X<a} = Jos, 2e-2%e-* dy dx 
& — (4,- 
= fie “Ox 
=1-e™% 
Example 1e 


Consider a circle of radius R, and suppose that a point within the circle is 
randomly chosen in such a manner that all regions within the circle of equal area 
are equally likely to contain the point. (In other words, the point is uniformly 
distributed within the circle.) If we let the center of the circle denote the origin and 
define X and Y to be the coordinates of the point chosen (Figure 6.1 __), then, 
since (X, Y) is equally likely to be near each point in the circle, it follows that the 
joint density function of X and Y is given by 


c if x?+y?2< R? 
PODS) on sy 
0 ifx*+y*>R 


Figure 6.1 Joint probability distribution. 
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(ey 
wy 


for some value of c. 


a. Determine c. 

b. Find the marginal density functions of X and Y. 

c. Compute the probability that D, the distance from the origin of the point 
selected, is less than or equal to a. 

d. Find E [D]. 


Solution 


a. Because 


[ | rons 
Cc {| dydx =1 


x2+y2<R? 


it follows that 


We can evaluate J J ,2,,2<p2 dy dx either by using polar coordinates or, 
more simply, by noting that it represents the area of the circle and is thus 


equal to 7R?. Hence, 


fo. =f2.f@x) dy 
1 
= — pid x2ty2cR? dy 


1 
= aS dy, wherea = /R?— x? 
1 —-a 


2 


= — VR? -x2, x2 <R? 
mR 


and it equals 0 when x? > R?. By symmetry, the marginal density of Y is 
given by 


2 
fy) = ae R?—y?, y? <R? 


=0 ye RR? 


c. The distribution function of D = ./X* + Y’, the distance from the origin, is 
obtained as follows: For0 <a<R, 


Fo(a) =P{Vx?+¥? <a} 
P{x* + Y* <a?} 


ff f (x,y) dy dx 
+y2 < a? 


=> a 


x 


1 
= =s oi dy dx 
™R x*+ty?2< a? 
_ ma? 
TR? 
Pe: 
~ RZ 


where we have used the fact that SS2sy2 < q2 dy dx is the area of a circle 
of radius a and thus is equal to tra’. 


d. From part (c), the density function of D is 


2a 
fp(@) = Fa O0<a<R 


Hence, 
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Example 1f 


The joint density of X and Y is given by 


e %*Y V0<x<w, 0<y<o 
0 otherwise 


f(y) =| 


Find the density function of the random variable X/Y. 


Solution 


We start by computing the distribution function of X/Y. For a> 0, 


xX 
Fy jy(a) =P y=?% 
Sf, Oa 
x/y <a 
=JoSy eo” dx dy 


= fc —e Y)e Ydy 


e (at)Dy 
= —e V4 —_ 
a+1 


(oe) 


0 


Differentiation shows that the density function of X/Y is given by 
fyjy@ = 1/(at 1)7,0<a< o. 

We can also define joint probability distributions for n random variables in exactly the 
same manner as we did for n = 2. For instance, the joint cumulative probability 
distribution function F(a,,Q@z,...,@,) of the n random variables X,,X2,...,X, is defined 
by 


FGQyGa iis Gy) SP he SOAs = ig k= at 


Further, the n random variables are said to be jointly continuous if there exists a 
function f(x1,%2,...,Xn), Called the joint probability density function, such that, for any 
set C in n-space, 


PL OG j Xo sy XH) E C= {| FAC SPORTS C8 Cr anee eit ae 
(Sas 


ws Xn) ec 
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In particular, for any n sets of real numbers A,, A3,..., An, 


P{X, € Ay, Xz € Ag, ..., Xn € An} 


= pel la Ot ap Xp) dx,dx2::dxy 


An-1 


Example 1g The Multinomial Distribution 


One of the most important joint distributions is the multinomial distribution, which 
arises when a sequence of n independent and identical experiments is 
performed. Suppose that each experiment can result in any one of r possible 


‘s 
outcomes, with respective probabilities p,,p,, ...,p,., p, = 1. If we let x; 
i=4 


denote the number of the n experiments that result in outcome number i, then 


! 
ni n1..N2 


—___ pp)? 
Nylnghenpl eee FF 


= 
whenever ». nj =n. 


i 1 


Equation (1.5) is verified by noting that any sequence of outcomes for the n 
experiments that leads to outcome i occurring n; times for i = 1, 2,...,7 will, by 
the assumed independence of experiments, have probability 0 0 as p,. of 
occurring. Because there are n!/(n,!n,!...n,!) such sequences of outcomes 
(there are n!/n,!...n,! different permutations of n things of which n, are alike, n, 
are alike, ...,n, are alike), Equation (1.5) —_is established. The joint distribution 
whose joint probability mass function is specified by Equation (1.5) __ is called 
the multinomial distribution. Note that when r = 2, the multinomial reduces to the 
binomial distribution. 


Note also that any sum of a fixed set of the X;s will have a binomial distribution. 


That is, if N c {1,2,...,r}, then > tens will be a binomial random variable with 


parameters n and p = 2 ien P;- This follows because ye icn 4; Yepresents the 
number of the n experiments whose outcome is in N, and each experiment will 


independently have such an outcome with probability > Dis 
ieN 


As an application of the multinomial distribution, suppose that a fair die is rolled 9 
times. The probability that 1 appears three times, 2 and 3 twice each, 4 and 5 
once each, and 6 not at all is 
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9! PAV IN AN A 8 of 

avant (s) (s) (<) (¢) (¢) (¢) ~sani(s) 
We can also use the multinomial distribution to analyze a variation of the 
classical birthday problem which asks for the probability that no 3 people in a 
group of size n have the same birthday when the birthdays of the n people are 
independent and each birthday is equally likely to be any of the 365 days of the 
year. Because this probability is 0 when n > 730 (why is this), we will suppose 
that n < 730. To find the desired probability, note that there will be no set of 3 
people having the same birthday if each of the 365 days of the year is the 
birthday of at most 2 persons. Now, this will be the case if for some i < n/2 the 
event A; occurs, where A; is the event that the 365 days of the year can be 
partitioned into three groups of respective sizes i,n — 2i, and 365 —n+isuch 
that every day in the first group is the birthday of exactly 2 of the n individuals, 
every day in the second group is the birthday of exactly 1 of the n individuals, 
and every day in the third group is the birthday of none of the n individuals. Now, 
because each day of the year is equally likely to be the birthday of an individual, 
it follows, for a given partition of the 365 days into three groups of respective 
sizes i, n— 2i, and 365 — n + i, that the probability each day in the first group is 
the birthday of exactly 2 of the n individuals, each day in the second group is the 
birthday of exactly 1 of the n individuals, and each day in the third group is the 
birthday of none of the n individuals is equal to the multinomial probability 


n! tS 
anian® 2912 Bl B65) 


As the number of partitions of the 365 days of the year into 3 groups of 
i ee ee 365! 
respective sizes i,n — 2i,365-—n+ilis iin — 20365 -n +0! it follows that 
365! n! 


te ay ots 
PAD = Fim 2365 —n +d! 2! 365? ' 


L=n/j2 


As the events A;, i < n/2, are mutually exclusive we have that 


[n /2] 
P{no set of three with same birthday} = > 


i=0 


365! n! 1 


i!(n — 2i)1(365 —n+i)! 7 365) 


When n = 88, the preceding gives 


44 
P{no set of three with same birthday} = ». 


i=0 


365! 88! 1 og 
(7 504 
i!(88 — 2i)!(277 +i)! 2! 365 


6.2 Independent Random Variables 


The random variables X and Y are said to be independent if, for any two sets of real 
numbers A and B, 


(2.1) 
P{X € A,Y € B} = P{X € A}P{Y € B} 


In other words, X and Y are independent if, for all A and B, the events E, = {X € A} 
and F, = {Y € B} are independent. 


It can be shown by using the three axioms of probability that Equation (2.1) will 
follow if and only if, for all a, b 


P{X <a,Y < b} = P{X < a}P{Y < b} 
Hence, in terms of the joint distribution function F of X and Y, X and Y are 
independent if 
F(a,b) = Fx(a)Fy(b) forall a,b 


When X and Y are discrete random variables, the condition of independence (2.1) is 
equivalent to 


(2.2) 


P(x, ¥) = py(x)py(y) forall x,y 


The equivalence follows because, if Equation (2.1) _ is satisfied, then we obtain 
Equation (2.2) by letting A and B be, respectively, the one-point sets A = {x} and 
B = {y}. Furthermore, if Equation (2.2) __ is valid, then for any sets A, B, 
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P{X € A,Y € B} 


> ». p(x, y) 


yEBxeEea 


> >. Pr@ry) 


yEBxead 


>, PO) DP” 


y ERB x EA 


P{Y € B}P{X € A} 


and Equation (2.1) is established. 
In the jointly continuous case, the condition of independence is equivalent to 


fy) =fy@OfyO) forall x,y 


Thus, loosely speaking, X and Y are independent if knowing the value of one does 
not change the distribution of the other. Random variables that are not independent 
are said to be dependent. 


Example 2a 


Suppose that n + m independent trials having a common probability of success p 
are performed. If X is the number of successes in the first n trials, and Y is the 
number of successes in the final m trials, then X and Y are independent, since 
knowing the number of successes in the first n trials does not affect the 
distribution of the number of successes in the final mm trials (by the assumption of 
independent trials). In fact, for integral x and y, 


P{X=x,Y=y} = (") a Cle p(") Pup” 
= P(X = x}P{y = y} 


In contrast, X and Z will be dependent, where Z is the total number of successes 
in the n + m trials. (Why?) 


Example 2b 


Suppose that the number of people who enter a post office on a given day is a 
Poisson random variable with parameter 2. Show that if each person who enters 
the post office is a male with probability p and a female with probability 1 — p, 
then the number of males and females entering the post office are independent 
Poisson random variables with respective parameters Ap and A(1 — p). 
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Solution 


Let X and Y denote, respectively, the number of males and females that enter the 
post office. We shall show the independence of X and Y by establishing 
Equation (2.2) _. To obtain an expression for P{X = i, Y = j}, we condition on 
whether or not X + Y =i+ j. This gives: 


P{X=i,Y=j} =P{X=iY=j|X+Y=it PP(X+Y=i+)7 


+ P{X=i,Y=j|X+YH#i+ fpP{X+y zit j} 


[Note that this equation is merely a special case of the formula 
P(E) = P(E|F)P(F) + P(E|F°)P(F‘).] 


Since P{X =i,Y = j|X+Y #i+ j}is clearly 0, we obtain 
(2.3) 


P{X =1,Y =f} =P{X=i,Y =j|X+Y=i+ pP(xt+Y=i+j 


Now, because X + Y is the total number of people who enter the post office, it 
follows, by assumption, that 


(2.4) 
Ritts 


—~j4fNae-4 
P{X+Y=it+j}y=e aap! 


Furthermore, given that i + j people do enter the post office, since each person 
entering will be male with probability p, it follows that the probability that exactly i 
of them will be male (and thus j of them female) is just the binomial probability 


(' Veta — py) 


Sees S Soden ec PONS F 
P{X=24,Y=jf|X+Y =it+fe . |p'(1—-p) 
i 


Substituting Equations (2.4) and(2.5 +) into Equation (2.3) yields 
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qn (2.6) 


G+)! 
= Ay py! 
_ ony ic - Bae pI’ 
Hence, 
(2.7) 


P{X = i} = e*P a ee p)|/ = ewan AP) 
j 


and similarly, 


(2.8) 


A(1—p)|i 
PY =jse 4%! a 


Equations (2.6) ,(2.7) ,and(2.8) — establish the desired result. 


Example 2c 


A man and a woman decide to meet at a certain location. If each of them 
independently arrives at a time uniformly distributed between 12 noon and 1 P.M., 
find the probability that the first to arrive has to wait longer than 10 minutes. 


Solution 


If we let X and Y denote, respectively, the time past 12 that the man and the 
woman arrive, then X and Y are independent random variables, each of which is 
uniformly distributed over (0, 60). The desired probability, 

P{X +10 <Y}+ P{Y + 10 < X}, which, by symmetry, equals 2P{X + 10 < Y}, is 
obtained as follows: 


2P{x+10<Y} =2 ff  f(x,y)dxdy 
x +10 <y 


=2 JJ fy@)fy0) dxdy 


x +10 <y 


1\2 
60 py—10 
=2) 30/5 (3) dx dy 


60 
2 

~ (60)? 
_ 25 
36 


60 
Sip (y — 10) dy 


Our next example presents the oldest problem dealing with geometrical probabilities. 
It was first considered and solved by Buffon, a French naturalist of the eighteenth 
century, and is usually referred to as Buffon’s needle problem. 


Example 2d Buffon’s Needle Problem 


A table is ruled with equidistant parallel lines a distance D apart. A needle of 
length L, where L < D, is randomly thrown on the table. What is the probability 
that the needle will intersect one of the lines (the other possibility being that the 
needle will be completely contained in the strip between two lines)? 


Solution 


Let us determine the position of the needle by specifying (1) the distance X from 
the middle point of the needle to the nearest parallel line and (2) the angle 6 
between the needle and the projected line of length X. (See Figure 6.2.) The 
needle will intersect a line if the hypotenuse of the right triangle in Figure 6.2 

is less than L /2—that is, if 


Figure 6.2 


As X varies between 0 and D/2 and @ between 0 and 7/2, it is reasonable to 
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assume that they are independent, uniformly distributed random variables over 
these respective ranges. Hence, 


L 
Pix < oso} 7 SS FOO fg(v) dx dy 


x < L/2 cos y 


4 m/2 -L/2cosy 
=a I; dx dy 


4 n/2L 
= apie 3 cos ydy 
_ 2L 
i) 
“Example 2e Characterization of the Normal Distribution 


Let X and Y denote the horizontal and vertical miss distances when a bullet is 
fired at a target, and assume that 


1. X and Y are independent continuous random variables having 
differentiable density functions. 

2. The joint density f(x,y) = f,(x)f,(y) of X and Y depends on (x, y) only 
through x? + y?. 


Loosely put, assumption 2 states that the probability of the bullet landing on any 
point of the x—y plane depends only on the distance of the point from the target 
and not on its angle of orientation. An equivalent way of phrasing this assumption 
is to say that the joint density function is rotation invariant. 


It is a rather interesting fact that assumptions 1 and 2 imply that X and Y are 
normally distributed random variables. To prove this, note first that the 
assumptions yield the relation 


(2.9) 
f(y) = fy COF,O) = g(x? +9”) 
for some function g. Differentiating Equation (2.9) — with respect to x yields 
(2.10) 
f'x OF Cy) = 2x g' (x? + y”) 
Dividing Equation (2.10) by Equation (2.9) gives 


f'x@) _ 2xg'(x* +") 
flO oy") 


or 


(2.44) 


f'x@ _ g'@?+y") 
2x f(x) g(x? + y?) 


Because the value of the left-hand side of Equation (2.11) | depends only on x, 
whereas the value of the right-hand side depends on x? + y?, it follows that the 
left-hand side must be the same for all x. To see this, consider any x1, x2 and let 
¥1y, be such that x*, + y*, = x7, +2. Then, from Equation (2.11), we 


obtain 
f'x (%1) a g (x? +71) _ g' (x*2 + 72) _ f'x (x2) 
2x1 fy(%1) g(X*4y +71) gg (X2n +y72) 2x2 fy (x2) 
Hence, 
: d 
aa =c or x lob x ()) = cx 


which implies, upon integration of both sides, that 


cx? 


5 Ot fg) = kee’? 


logf (x) =a+t 


Since i. f (x) dx = 1, it follows that c is necessarily negative, and we may 
write c= — 1/07. Thus, 


pws ere 


That is, X is a normal random variable with parameters uw = 0 and o”. A similar 
argument can be applied to f,,(y) to show that 


aor pe" 


1 
fo) = Lis 


Furthermore, it follows from assumption 2 that 2 = a? and that X and Y are thus 
independent, identically distributed normal random variables with parameters 
u=0ando?. 


A necessary and sufficient condition for the random variables X and Y to be 
independent is for their joint probability density function (or joint probability mass 
function in the discrete case) f(x, y) to factor into two terms, one depending only on 
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x and the other depending only on y. 


Proposition 2.1 


The continuous (discrete) random variables X and Y are independent if and only 
if their joint probability density (mass) function can be expressed as 


fry GY =h@)gQ)  — — © <x< @,- ww <y< om 


Proof Let us give the proof in the continuous case. First, note that independence 
implies that the joint density is the product of the marginal densities of X and Y, 
so the preceding factorization will hold when the random variables are 
independent. Now, suppose that 


fy oy) = h(x) o() 
Then 


1 = Jr J of gy y) axdy 


=f" a(x) dxf" gi”) dy 
= CiC2 


where C, = f°. h(x) dx and C, = f", g(y) dy. Also, 


(=f ef 2) ey=G h@ 
fro) =S i fyy Gy) dx = Cz 9() 
Since C,C, = 1, it follows that 


fg OD =F@OF,0) 


and the proof is complete. 
Example 2f 
If the joint density function of X and Y is 


XV) =6e eae *” 0<x< 0,0<y< 
y 


and is equal to 0 outside this region, are the random variables independent? 
What if the joint density function is 


f(% y) = 24xy 0<x<1, O0<y<1, O<x+y<1 
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and is equal to 0 otherwise? 


Solution 


In the first instance, the joint density function factors, and thus the random 
variables, are independent (with one being exponential with rate 2 and the other 
exponential with rate 3). In the second instance, because the region in which the 
joint density is nonzero cannot be expressed in the form x € A, y € B, the joint 
density does not factor, so the random variables are not independent. This can 
be seen clearly by letting 


1 if0<x<1,0<y<1,0<x+y<1 
0 otherwise 


I(x, y) =| 


and writing 

f(x,y) = 24xy I(x, y) 
which clearly does not factor into a part depending only on x and another 
depending only on y. 


The concept of independence may, of course, be defined for more than two random 
variables. In general, the n random variables X,,X>2,...,X, are said to be independent 
if, for all sets of real numbers Aj, Ap,..., An, 


n 
P{X, € Aq, Xp € Agy Xn € An} = ial P{X; € Aj} 


ol 
As before, it can be shown that this condition is equivalent to 


P(X = Gy Xo S C5) u  Xy Say} 


n 
= | | P{X; < a;} forall a,,a,...,ay 
C4 


Finally, we say that an infinite collection of random variables is independent if every 
finite subcollection of them is independent. 


Example 2g How can a computer choose a random subset? 


Most computers are able to generate the value of, or simulate, a uniform (0, 1) 
random variable by means of a built-in subroutine that (to a high degree of 
approximation) produces such “random numbers.” As a result, it is quite easy for 
a computer to simulate an indicator (that is, a Bernoulli) random variable. 
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Suppose / is an indicator variable such that 


P{l = 1} =p =1-Pf{I = 0} 


The computer can simulate J by choosing a uniform (0, 1) random number U and 
then letting 


A if U<p 
“Of US 


Suppose that we are interested in having the computer select k,k <n, of the 
n 
numbers 1, 2, ...,n in such a way that each of the (;) subsets of size k is equally 


likely to be chosen. We now present a method that will enable the computer to 
solve this task. To generate such a subset, we will first simulate, in sequence, n 
indicator variables /,, 12, ...,1,, of which exactly k will equal 1. Those i for which 
I; = 1 will then constitute the desired subset. 


To generate the random variables J,,...,/,, start by simulating n independent 
uniform (0, 1) random variables U,, U>,...,U,,. Now define 


k 
n 


0 otherwise 


and then, once /,,...,/; are determined, recursively set 


ky 4) 


1 if U; 
1 it+1 < n—-i 


liga = 
0 otherwise 


In words, at the (i+ 1)th stage, we set /;,, equal to 1 (and thus put i + 1 into the 
desired subset) with a probability equal to the remaining number of places in the 
i 
subset | namely, k — ~ I; |, divided by the remaining number of possibilities 
jsi 
(namely, n — i). Hence, the joint distribution of /,,/2,..., 1, is determined from 


k 
Pi,=1} =2 


i 
k- ) yj 
j= 4 


Piles =1| Liss} cea rAd 1<i<n 
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The proof that the preceding formula results in all subsets of size k being equally 
likely to be chosen is by induction on k + n. It is immediate when k + n = 2 (that 
is, when k = 1,n = 1), so assume it to be true whenever k + n < 1. Now, 
suppose that k +n =1+ 1, and consider any subset of size k—say, 

i, < ty < ++: < i,—and consider the following two cases. 


Case 1: i, =1 
Pi, =], =+ =1;, = 1,1; = 0 otherwise} 
= P{l, = IP{Ii, = + = 1;, = 1,1; = 0 otherwise| I, = 1} 
Now given that J, = 1, the remaining elements of the subset are chosen as if a 


subset of size k — 1 were to be chosen from the n — 1 elements 2,3,...,n. 
Hence, by the induction hypothesis, the conditional probability that this will result 


n— 
in a given subset of size k — 1 being selected is u(} a Hence, 
Pll, =I, =+ =1;, = 1,1; = 0 otherwise} 
-k 11 
“nf{n-1\ [n 
k-1 k 
Case 2: i, #1 
Pll, =), =+ =1;, = 1,1; = 0 otherwise} 
= P{I;, =+- =1;, = 1,1; = 0 otherwise|/, = 0}P{I, = 0} 


eal-D-f 


where the induction hypothesis was used to evaluate the preceding conditional 
probability. 


Thus, in all cases, the probability that a given subset of size k will be the subset 


; n 
chosen is (7) : 


Remark The foregoing method for generating a random subset has a very low 
memory requirement. A faster algorithm that requires somewhat more memory is 
presented in Section 10.1 __. (The latter algorithm uses the last k elements of a 
random permutation of 1, 2,...,.) 
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Example 2h 


Let X, Y, Z be independent and uniformly distributed over (0, 1). Compute 
P{X > YZ}. 


Solution 
Since 
i092. =fOL OO 
=1, 0<*<1,0<sy<1,0<z<l1 
we have 


POCS YT). So [fil Fyag@y zy aedydz 

x 2yz 
1-1-1 

=JiSoly, dxdydz 
1 pl 

= JjJo(1—yz) dydz 
1 Z 

= Jo(1- 3) 


3 
4 


Example 2i Probabilistic Interpretation of Half-Life 


Let N(t) denote the number of nuclei contained in a radioactive mass of material 
at time t. The concept of half-life is often defined in a deterministic fashion by 
stating this it is an empirical fact that, for some value h, called the half-life, 


N(t) = 2-‘/"N(0) t>0 


[Note that N(h) = N(0)/2.] Since the preceding implies that, for any nonnegative 
sand t, 


N(t +s) = 27> 6*9/"N(0) = 27 */*N(s) 


it follows that no matter how much time s has already elapsed, in an additional 
time t, the number of existing nuclei will decrease by the factor 2. °/". 


Because the deterministic relationship just given results from observations of 
radioactive masses containing huge numbers of nuclei, it would seem that it 
might be consistent with a probabilistic interpretation. The clue to deriving the 
appropriate probability model for half-life resides in the empirical observation that 
the proportion of decay in any time interval depends neither on the total number 
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of nuclei at the beginning of the interval nor on the location of this interval [since 
N(t + s)/N(s) depends neither on N(s) nor on s]. Thus, it appears that the 
individual nuclei act independently and with a memoryless life distribution. 
Consequently, since the unique life distribution that is memoryless is the 
exponential distribution, and since exactly one-half of a given amount of mass 
decays every h time units, we propose the following probabilistic model for 
radioactive decay. 


Probabilistic interpretation of the half-life h: The lifetimes of the individual 
nuclei are independent random variables having a life distribution that is 
exponential with median equal to h. That is, if L represents the lifetime of a given 
nucleus, then 


P{h<t}=1-2°°/" 


1 
(Because P{L < h} = 3 and the preceding can be written as 


log 2 


PiL<)=1-exp| -eF| 


it can be seen that L indeed has an exponential distribution with median h.) 


Note that under the probabilistic interpretation of half-life just given, if one starts 
with N(0) nuclei at time 0, then N(t), the number of nuclei that remain at time t 
will have a binomial distribution with parameters n = N(0) and p= 2 ‘/". 
Results of Chapter 8 will show that this interpretation of half-life is consistent 
with the deterministic model when considering the proportion of a large number 
of nuclei that decay over a given time frame. However, the difference between 
the deterministic and probabilistic interpretation becomes apparent when one 
considers the actual number of decayed nuclei. We will now indicate this with 


regard to the question of whether protons decay. 


There is some controversy over whether or not protons decay. Indeed, one 
theory predicts that protons should decay with a half-life of about h = 102° years. 
To check this prediction empirically, it has been suggested that one follow a large 
number of protons for, say, one or two years and determine whether any of them 
decay within that period. (Clearly, it would not be feasible to follow a mass of 
protons for 102° years to see whether one-half of it decays.) Let us suppose that 
we are able to keep track of N(0) = 102° protons for c years. The number of 
decays predicted by the deterministic model would then be given by 
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N(O)—N(c) =a -2° 7%) 
{= 2-¢/h 


1 
ince -=10 29x 0 
since h 


R 
5 


lim (c2 “log 2) by L’H6pital’s rule 
x 70 


= clog2 = .6931c 


For instance, the deterministic model predicts that in 2 years there should be 
1.3863 decays, and it would thus appear to be a serious blow to the hypothesis 
that protons decay with a half-life of 102° years if no decays are observed over 
those 2 years. 


Let us now contrast the conclusions just drawn with those obtained from the 
probabilistic model. Again, let us consider the hypothesis that the half-life of 
protons is h = 10°° years, and suppose that we follow h protons for c years. 
Since there is a huge number of independent protons, each of which will have a 
very small probability of decaying within this time period, it follows that the 
number of protons that decay will have (to a very strong approximation) a 
Poisson distribution with parameter equal to h(1 — 2 = clog2. Thus, 


P{0 decays} =e ©!082 
ae ee 
=e log (2°) — ze 
and, in general, 
2 “(clog 2|” 
P{n decays} = ——_——— n=0 


n! 


Thus, we see that even though the average number of decays over 2 years is (as 
predicted by the deterministic model) 1.3863, there is 1 chance in 4 that there will 
not be any decays, thereby indicating that such a result in no way invalidates the 
original hypothesis of proton decay. 


Remarks /ndependence is a symmetric relation. The random variables X and Y are 
independent if their joint density function (or mass function in the discrete case) is 
the product of their individual density (or mass) functions. Therefore, to say that X is 
independent of Y is equivalent to saying that Y is independent of X — or just that X 
and Y are independent. As a result, in considering whether X is independent of Y in 
situations where it is not at all intuitive that knowing the value of Y will not change the 
probabilities concerning X, it can be beneficial to interchange the roles of X and Y 
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and ask instead whether Y is independent of X. The next example illustrates this 
point. 


Example 2j 


If the initial throw of the dice in the game of craps results in the sum of the dice 
equaling 4, then the player will continue to throw the dice until the sum is either 4 
or 7. If this sum is 4, then the player wins, and if it is 7, then the player loses. Let 
N denote the number of throws needed until either 4 or 7 appears, and let X 
denote the value (either 4 or 7) of the final throw. Is N independent of X? That is, 
does knowing which of 4 or 7 occurs first affect the distribution of the number of 
throws needed until that number appears? Most people do not find the answer to 
this question to be intuitively obvious. However, suppose that we turn it around 
and ask whether X is independent of N. That is, does knowing how many throws 
it takes to obtain a sum of either 4 or 7 affect the probability that that sum is 
equal to 4? For instance, suppose we know that it takes n throws of the dice to 
obtain a sum of either 4 or 7. Does this affect the probability distribution of the 
final sum? Clearly not, since all that is important is that its value is either 4 or 7, 
and the fact that none of the first n — 1 throws were either 4 or 7 does not change 
the probabilities for the nth throw. Thus, we can conclude that X is independent 
of N, or equivalently, that N is independent of xX. 


As another example, let X,, X>,... be a sequence of independent and identically 
distributed continuous random variables, and suppose that we observe these 
random variables in sequence. If X,, > X; for each i = 1,...,n — 1, then we say 
that X,, is a record value. That is, each random variable that is larger than all 
those preceding it is called a record value. Let A,, denote the event that X, isa 
record value. Is A,,,, independent of A,,? That is, does knowing that the nth 
random variable is the largest of the first n change the probability that the (n + 1) 
random variable is the largest of the first n + 1? While it is true that A,,,, is 
independent of A,,, this may not be intuitively obvious. However, if we turn the 
question around and ask whether A,, is independent of A,,,,, then the result is 
more easily understood. For knowing that the (n + 1) value is larger than 
X1,..,Xy Clearly gives us no information about the relative size of X,, among the 
first n random variables. Indeed, by symmetry, it is clear that each of these n 
random variables is equally likely to be the largest of this set, so 

P(A, |An+1) = P(An) = 1/n. Hence, we can conclude that A, and A,,., are 
independent events. 


Remark It follows from the identity 


PENS Gs nig hy Say} 
= P(X, =a jPix, = a5 |X 2 ay Pi = al ky S ijn XK nae SO 24} 
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that the independence of X,,...,X, can be established sequentially. That is, we 
can show that these random variables are independent by showing that 


X,  isindependent of X, 
Xz  isindependent of X,,X> 
X, isindependent of X,,X2,X3 


Xy, isindependentof X,,...,Xn—1 


6.3 Sums of Independent Random Variables 


It is often important to be able to calculate the distribution of X + Y from the 
distributions of X and Y when X and Y are independent. Suppose that X and Y are 
independent, continuous random variables having probability density functions f,, 
and f,,. The cumulative distribution function of X + Y is obtained as follows: 


(3.1) 
Fysy(a) =P{X+Y<a} 


Je PxOLO) dx dy 


x 


={_ J. foreo)axdy 
= fos Fy Oaxf 0”) dy 


= fo of y(a-y) fy”) dy 


The cumulative distribution function Fy ,y is called the convolution of the distributions 
Fy and Fy (the cumulative distribution functions of X and Y, respectively). 


By differentiating Equation (3.1) __, we find that the probability density function f, , 
of X + Y is given by 


(3.2) 
os 
fei9(@) = G,) 0 Fx(a— fy ay 
o d 
= J ago Fx(a-)FyO) dy 


= fof (a—y) fy”) dy 


6.3.1 Identically Distributed Uniform Random Variables 


It is not difficult to determine the density function of the sum of two independent 
uniform (0,1) random variables. 


Example 3a Sum of two independent uniform random variables 


If X and Y are independent random variables, both uniformly distributed on (0, 1), 
calculate the probability density of X+ Y. 


Solution 


From Equation (3.2) _, since 


1 0<a<i1 
0 otherwise 


fx@ =fy@ =| 


we obtain 
1 
frav® = | fa=y) dy 
0 
For 0 <a <1, this yields 
a 
I ih® - | dy=a 
0 
For 1<a< 2, we get 


1 
fev | dy=2-a 
a-1 


Hence, 
a O0O<a<l 
fyvay@ =42-¢a 1<a<2 
0 otherwise 


Because of the shape of its density function (see Figure 6.3 __), the random 
variable X + Y is said to have a triangular distribution. 


Figure 6.3 Triangular density function. 
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f(a) 


a 


0 l 2 


Now, suppose that X,, X>,...,X,, are independent uniform (0, 1) random variables, 
and let 


F,(x) = P{X, +... + Xn <x} 
Whereas a general formula for F,,, (x) is messy, it has a particularly nice form when 
x <1. Indeed, we now use mathematical induction to prove that 


F,() =x"/n!, Osxs1 


Because the proceeding equation is true for n = 1, assume that 


Foe e" fai, Derx=e1 


Now, writing 


and using the fact that the X; are all nonnegative, we see from Equation (3.1) 
that, forO <x <1, 


1 
Fy(x) =JSoFn-10—)fy Ody 
1 _ 
= moo —y)"""dy bythe induction hypothesis 
1 x ni 


=x" /n! 
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which completes the proof. 


For an interesting application of the preceding formula, let us use it to determine the 
expected number of independent uniform (0,1) random variables that need to be 
summed to exceed 1. That is, with X,, X2,... being independent uniform (0, 1) 
random variables, we want to determine E[N], where 


N = min{n:X, +... +X, >1} 


Noting that N is greater than n > 0 if and only if X, + ..+ X, < 1, we see that 


P{N>n}=F,(1)=1/n!, n>0 


Because 


P{N > 0}=1=1/0! 


we see that, for n > 0, 


1 
aa a ee aa ee oe al amg aT a 


Therefore, 


A n(n— 1) 
n! 


uel 

= 

] 
WM! 


1 


A Ti 


n 
n 
=e 


That is, the mean number of independent uniform (0, 1) random variables that must 
be summed for the sum to exceed 1 is equal to e. 


6.3.2 Gamma Random Variables 


Recall that a gamma random variable has a density of the form 


_ dew (Ay) * 
LO = SS 


An important property of this family of distributions is that for a fixed value of A, it is 
closed under convolutions. 
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Proposition 3.1 


If X¥ and Y are independent gamma random variables with respective parameters 
(s,A) and (t,A), then X + Y is a gamma random variable with parameters 
(s+t,A). 


Proof Using Equation (3.2) __, we obtain 


fyrxO™ = Torpote “A —y)]* *Ae-47(Aay)** dy 


= Ke-™fy(a—y)**y*tdy 


= Ke Mqs*t-1f 4 —x)° 'x'-1dx  byletting x = if 


-_ Ce 44%qstt-1 


where C is a constant that does not depend on a. But, as the preceding is a 
density function and thus must integrate to 1, the value of C is determined, and 
we have 


— Ae *4(Aa)e*** 
Prry@ = I(s +t) 


Hence, the result is proved. 


It is now a simple matter to establish, by using Proposition 3.1. — and induction, 
that if X;,i = 1, ...,n are independent gamma random variables with respective 


n n 
parameters (t;,4),i = 1,...,n, then > X; is gamma with parameters > tj, A 
i=1 i=1 


We leave the proof of this statement as an exercise. 


Example 3b 


Let X,, X>,...,X, be n independent exponential random variables, each having 


parameter 2. Then, since an exponential random variable with parameter A is the 


same as a gamma random variable with parameters (1,2), it follows from 
Proposition 3.1 that X, +X, +--+ X, is agamma random variable with 
parameters (n,/). 


n 
If Z1,Z,...,Z, are independent standard normal random variables, then ». Z? is 
4 
said to have the chi-squared (sometimes seen as y7) distribution with n degrees of 
freedom. Let us compute the density function of Y. When n = 1,Y = Z?,, and from 
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Example 7b of Chapter5 _, wesee that its probability density function is given 
by 


1 
f 20) = 72) + fal — VII 


aj SL eye 


5 ey /2)6?-4 
nn, se 


11 
But we recognize the preceding as the gamma distribution with parameters (5 3): 
1 
[A by-product of this analysis is that r(5) = vi.] But since each Z? is gamma 
LLY aha ; Baie at 
(5 5) it follows from Proposition 3.1. that the chi-squared distribution with n 


1 
degrees of freedom is just the gamma distribution with parameters (n/a, 5) and 


hence has a probability density function given by 


n/2-1 
=e 7/(2) 
2 2 


ty (y) = nN a yo 0 
Tl —- 
(3) 
= e@ Vieynlent 4 
2 


When nis an even integer, '(n/2) = [(n/2) — 1]!, whereas when n is odd, I'(n/2) 
can be obtained from iterating the relationship I(t) = (t — 1)'(t — 1) and then using 


1 
the previously obtained result that r(5) = Vz. [For instance, 


5 3a./3 31 /1 3 
r(3) - (5) : 33"(3) = 4 el 
In practice, the chi-squared distribution often arises as the distribution of the square 
of the error involved when one attempts to hit a target in n-dimensional space when 


the coordinate errors are taken to be independent standard normal random 
variables. It is also important in statistical analysis. 


6.3.3 Normal Random Variables 


We can also use Equation (3.2) to prove the following important result about 
normal random variables. 


Proposition 3.2 


If X;,i = 1,...,n, are independent random variables that are normally distributed 


n 
with respective parameters goat = 1,...,n, then ». X; is normally distributed 
_— 


n n 
with parameters S uu, and ». ae. 
(a4 iF 


Proof of Proposition 3.2: To begin, let X and Y be independent normal random 
variables with X having mean 0 and variance a? and Y having mean 0 and 
variance 1. We will determine the density function of X + Y by utilizing Equation 
(3.2) . Now, with 


we have 


__1 (a—y)*) 1 y? 
f a=y)f,0) oe exp a, oy on > 


a? Sef 2ya _. - ( a ) a 
202 \Y 1462 ~ 262 WT 4@2 “+ 02)? 
2 


op al aa 
Bot “Tee 207(1 + 0) 


a’ ‘ 1 +e a ) 
~ 202 1+02 Vy 1+02 


a2 


a Z. 
= saan tl -a33) 


Hence, 
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1 a a 2 
f,@-yf,o) = ono 8 ag | are exp] =((y-= Tie 


From Equation (3.2) _, we obtain that 
1 a’? a 2 F 
Friy@ = no -*P 2(1 + 0?) a amd 1+ oD v 


1 ar ieeria 
= Ing or “21 +6?) exp CX x 


a2 
: coml-aaram| 


where C does not depend on a. But this implies that X + Y is normal with mean 0 
and variance 1 + 07. 


Now, suppose that X, and X, are independent normal random variables with X; 
having mean yu, and variance oe i= 1,2. Then 


Xiu Xo—-U 
* 4 ae) a ay tay 
02 02 


Xi + Xo = A 


But since (X, — ,)/o2 is normal with mean 0 and variance ai /o%, and 

(X2 — H,)/o2 is normal with mean 0 and variance 1, it follows from our previous 
result that (X; — w,)/o2 + (X2 — ,)/e2 is normal with mean 0 and variance 

1 + 0?/0%, implying that X, + X, is normal with mean LL, +“, and variance 


o3(1 + 07/03) = Gr + 03, 


Thus, Proposition 3.2 is established when n = 2. The general case now 
follows by induction. That is, assume that Proposition 3.2 __ is true when there 
are n — 1 random variables. Now consider the case of n, and write 


n—1 n—ti1 
By the induction hypothesis, 2 X; is normal with mean z Lu, and variance 


i=1 i=1 
n—t1 


n n 
». a7. Therefore, by the result for n = 2, > X; is normal with mean ». i; 
i=1 i=1 


i=1 


n 
and variance ». CF: 
= 4 


i 
Example 3c 


A basketball team will play a 44-game season. Twenty-six of these games are 
against class A teams and 18 are against class B teams. Suppose that the team 
will win each game against a class A team with probability .4 and will win each 
game against a class B team with probability .7. Suppose also that the results of 
the different games are independent. Approximate the probability that 


a. the team wins 25 games or more; 
b. the team wins more games against class A teams than it does against 
class B teams. 


Solution 


a. Let X, and X, respectively denote the number of games the team wins 
against class A and against class B teams. Note that X, and Xz, are 
independent binomial random variables and 

E|X,| = 26.4) = 10.4 Var(X,) = 26(.4)(.6) = 6.24 


E[X,] = 18(.7) = 12.6 Var(Xg) = 18(.7)(.3) = 3.78 


By the normal approximation to the binomial, X, and Xz will have 
approximately the same distribution as would independent normal random 
variables with the preceding expected values and variances. Hence, by 
Proposition 3.2. , X, + Xz will have approximately a normal distribution 
with mean 23 and variance 10.02. Therefore, letting Z denote a standard 
normal random variable, we have 
P{X,+X,2>25} = P{X, +X, = 24.5} 
ett =273 = ie 
¥10.02 ~~ V10.02 


1.5 
= P}Z=> 
saa 
1 — P{Z < 4739} 
3178 


R 


R 


b. We note that X, — Xz will have approximately a normal distribution with 
mean —2.2 and variance 10.02. Hence, 
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P{X,-Xp2>1} =P{X,-X,>.5} 


_ p(Xaa=Xet2.2 | 5 +2.2 
7 V10.02 ~ v10.02 


2:7. 
x PYZ => 
aas| 
1 — P{Z < .8530} 
.1968 


R 


R 


Therefore, there is approximately a 31.78 percent chance that the team will win 
at least 25 games and approximately a 19.68 percent chance that it will win more 
games against class A teams than against class B teams. 


The random variable Y is said to be a lognormal random variable with parameters 
and @ if log (Y) is a normal random variable with mean u and variance a”. That is, Y 
is lognormal if it can be expressed as 


Y= e* 
where X is a normal random variable. 


Example 3d 


Starting at some fixed time, let S(n) denote the price of a certain security at the 
end of n additional weeks, n => 1. A popular model for the evolution of these 
prices assumes that the price ratios S(n) /S(n — 1),n = 1, are independent and 
identically distributed lognormal random variables. Assuming this model, with 
parameters yw = .0165,0 = .0730, what is the probability that 


a. the price of the security increases over each of the next two weeks? 
b. the price at the end of two weeks is higher than it is today? 
Solution 


Let Z be a standard normal random variable. To solve part (a), we use the fact 
that log(x) increases in x to conclude that x > 1 if and only if 
log (x) > log(1) = 0. As a result, we have 


S(1) _ S(1) 
apy —.0165 
7 > “0730 
= P{Z < .2260} 


= 5894 
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In other words, the probability that the price is up after one week is .5894. Since 
the successive price ratios are independent, the probability that the price 
increases over each of the next two weeks is (.5894)* = .3474. 


To solve part (b), we reason as follows: 


S(2) _ 1 (S(2) S(1) 
>| = oH 


S(2 S(1 
= Phiog( Sa) + lel So) > of 


S(2) S(1) . 
However, log 5(1) + log 5(0) , being the sum of two independent normal 


random variables with a common mean .0165 and a common standard deviation 
.0730, is a normal random variable with mean .0330 and variance 2(.0730)°. 


Consequently, 
S(2) sal. ales —.0330 
S(0) = .0730V2 
P{Z < .31965} 
6254 


6.3.4 Poisson and Binomial Random Variables 


Rather than attempt to derive a general expression for the distribution of X + Y in the 
discrete case, we shall consider some examples. 


Example 3e Sums of independent poisson random variables 

If X and Y are independent Poisson random variables with respective parameters 
A, and A,, compute the distribution of X+ Y. 

Solution 


Because the event {X + Y = n} may be written as the union of the disjoint events 
{X =k,Y =n—k},0 <k <n, we have 


PiX+Y=n} - a P(X =kY=n-k} 


n = 
=e eles aaa 
a k! (n—k)! 


n 
— p- (Ar +z) a 
wa Bi 


Pats? ee Hi eats 
= —____ Sa SS 
n! 2 ki(n—k)! + : 
e7 (A1 +42) . 
=— SOG + 42) 


Thus, X + X has a Poisson distribution with parameter 2, + A2. 


Example 3f Sums of independent binomial random variables 


Let X and Y be independent binomial random variables with respective 
parameters (n, p) and (m, p). Calculate the distribution of X+ Y. 


Solution 


Recalling the interpretation of a binomial random variable, and without any 
computation at all, we can immediately conclude that X + Y is binomial with 
parameters (n + m,p). This follows because X represents the number of 
successes in n independent trials, each of which results in a success with 
probability p; similarly, Y represents the number of successes in m independent 
trials, each of which results in a success with probability p. Hence, given that X 
and Y are assumed independent, it follows that X + Y represents the number of 
successes in n + m independent trials when each trial has a probability p of 
resulting in a success. Therefore, X + Y is a binomial random variable with 
parameters (n + m,p). To check this conclusion analytically, note that 
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P{IX+Y =k} 


I 
I M2 
x 
s< 
I 
Lan 
I 
es 
| 
as 


Il 
I M2 
ss 
as 
Se 
II 
co 
= 
a 
~< 
Il 
a 
| 
SS 


r 
where q = 1 — p and where ( ) = 0 when j < 0. Thus, 
J 


n 


PUK +Y =k) =ptgrim-& oie :) 


i=0 


and the conclusion follows upon application of the combinatorial identity 


2") - Y le ) 


t=0 


6.4 Conditional Distributions: Discrete Case 


Recall that for any two events E and F, the conditional probability of E given F is 
defined, provided that P(F) > 0, by 
P(EIF) = P(EF) 
1) = San 
Hence, if X and Y are discrete random variables, it is natural to define the conditional 
probability mass function of X given that Y = y, by 


Pyy|y) = P{X=xl¥ =y} 
_ P{X=x,Y=y} 
PY =y} 


_ Py) 
Py) 


for all values of y such that p,(v) > 0. Similarly, the conditional probability 
distribution function of X given that Y = y is defined, for all y such that p(y) > 0, by 
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Fyiy(xly) = P{X<x|¥ =y} 


7 ». Px|y(a|y) 


asx 


In other words, the definitions are exactly the same as in the unconditional case, 
except that everything is now conditional on the event that Y = y. If X is independent 
of Y, then the conditional mass function and the distribution function are the same as 
the respective unconditional ones. This follows because if X is independent of Y, 


then 

Pry|y) =P =x|¥=y} 
7 PX S4,Y Sy} 
PLY =y} 
_ PK =xPY = 9} 
PLY =y} 
= Pix =x} 

Example 4a 


Suppose that p(x, y), the joint probability mass function of X and Y, is given by 
p(0,0)=.4 p(0,1)=.2 p(1,0)=.1 p(1,1)=.3 
Calculate the conditional probability mass function of X given that Y = 1. 


Solution 


We first note that 


py) = ) ps1) = p(0,1) +p(,1) = «5 


Hence, 

_ p(0,1)_ 2 
and 

_ pl) 3 
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Example 4b 


If X and Y are independent Poisson random variables with respective parameters 
A, and Az, calculate the conditional distribution of X given that X + Y =n. 


Solution 


We calculate the conditional probability mass function of X given that X + Y =n 
as follows: 


_ P{X=k,X+Y=n} 
P{x+Y=n} 
_P{X=kY=n-h} 

~~ P{IX+VY=n} 
_ P(X =k}P{Y =n-k} 
7 P{x+Y=n} 


P(X =k|X+Y=n} 


where the last equality follows from the assumed independence of X and Y. 
Recalling (Example 3e __+) that X + Y has a Poisson distribution with parameter 
A, +2, we see that the preceding equals 


P{X=k|X+Y=n = 


a —1 
eA1ak e-Azan-k 4 +A2)" 
n! 


k! (n-k)! 
_ nt. . ae 
~ (n—k)! (Ay +42)” 


7 n ay k ao n-k 
ND (5) (—*-) 


In other words, the conditional distribution of X given that X + Y = nis the 
binomial distribution with parameters n and A, /(A, + Az). 


We can also talk about joint conditional distributions, as is indicated in the next two 
examples. 


Example 4c 


Consider the multinomial distribution with joint probability mass function 


n! 


k 
PX, = mpi = 1k} = Pn Pe n, = 0, > nj=rn 
1° . 
i=1 


Such a mass function results when n independent trials are performed, with each 
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k 

trial resulting in outcome i with probability p,, ». p, = 1. The random variables 

i=1 
X;, i= 1,...,k, represent, respectively, the number of trials that result in outcome 
i, i= 1,..,k. Suppose we are given that n,; of the trials resulted in outcome j, for 
j=r+1,...,k, where 
jumesn. 
Then, because each of the other n — m trials must have resulted in one of the 
outcomes 1,...,7, it would seem that the conditional distribution of X,, ..., X, is the 


multinomial distribution on n — m trials with respective trial outcome probabilities 


k 
Lj=rt+1 n 


; : Pi. 
P{outcome i| outcome is not any of r + 1,...,k} = > i=1,..,7r 
rT 


where F, = Dk, p, is the probability that a trial results in one of the outcomes 
Here ae 


Solution 


we 
To verify this intuition, let nj, ...,n,, be such that nj =n—m. Then 
i=1 


PIX, = 14, Xp = My [Xp t= Mp X~ = MH} 
= PAX, = 15 cep ake = Tip f 
P{Xy41 = Nytay ves Xx = Nx} 


n Nr+1 |) ak 


— 7 PL Pr Prt Pk 


— n Nk 
Fr ™y Titeep 
iG rt+1 k 
(n—m)!n,44!0-N,! 


where the probability in the denominator was obtained by regarding outcomes 
1, ...,7 as a Single outcome having probability F,., thus showing that the 
probability is a multinomial probability on n trials with outcome probabilities 


“ 
F,,P,44) +) P,- Because >. im = n-—~m, the preceding can be written as 
t= 


P(X, = 24, Xp = Me [Xp = Mt 1 XE = MS 


and our intuition is upheld. 


Example 4d 


Consider n independent trials, with each trial being a success with probability p. 
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Given a total of k successes, show that all possible orderings of the k successes 
and n — k failures are equally likely. 


Solution 


n 
We want to show that given a total of k successes, each of the (7) possible 


orderings of k successes and n — k failures is equally likely. Let X denote the 
number of successes, and consider any ordering of k successes and n — k 
failures, say, o=(s,...,5,f,..,f). Then 


P(o,X =k) 
P(X =k) 


_ Po) 
~ P(X =k) 


P(o|X=k) = 


pra _ py * 


(xc —p)" * 
zs 
(;) 

6.5 Conditional Distributions: Continuous 
Case 


If X and Y have a joint probability density function f(x,y), then the conditional 
probability density function of X given that Y = y is defined, for all values of y such 
that f(y) > 0, by 


fy) 
fo) 


fjy@ly) = 


To motivate this definition, multiply the left-hand side by dx and the right-hand side 
by (dx dy)/dy to obtain 


_ f(% y) dx dy 
fy) dy 


P{x<X <x+dxy<Y¥<ytdy} 
= Ply <Y <y+ dy} 


f xlyOrly) dx 


=P{x<X<xtdxly<¥<y+dy} 
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In other words, for small values of dx and dy, f yyy @lydx represents the conditional 


probability that X is between x and x + dx given that Y is between y and y + dy. 


The use of conditional densities allows us to define conditional probabilities of events 
associated with one random variable when we are given the value of a second 
random variable. That is, if X and Y are jointly continuous, then, for any set A, 


roceaty == | fyyloln 


A 


In particular, by letting A = ( — 90, a) we can define the conditional cumulative 
distribution function of X given that Y = y by 


Fyly(aly) = P{xX sal¥ =y} -| fyjy@ly) dx 


The reader should note that by using the ideas presented in the preceding 
discussion, we have been able to give workable expressions for conditional 
probabilities, even though the event on which we are conditioning (namely, the event 
{Y = y}) has probability 0. 


If X and Y are independent continuous random variables, the conditional density of X 
given that Y = y is just the unconditional density of X. This is so because, in the 
independent case, 


f(xy) — fyOfyO) 


f yy ly) = 7,0) AO) = fy(x) 


Example 5a 


The joint density of X and Y is given by 


12 
—X(2-—x =) 0<x<10<y<1 
f(y)=)9 


0 otherwise 
Compute the conditional density of X given that Y = y, where0 <y <1. 


Solution 


For0<x<1,0<y< 1, wehave 


_ fy) 
fyo) 

___ fy) 
fra f Cay) dx 
x(2—x-y) 

fox(2—x —y) dx 

_ x(2—-x—y) 

=> 

3 y/2 

_ 6x(2-—x—y) 

—  4-3y 


f xjy ly) 


Example 5b 
Suppose that the joint density of X and Y is given by 


e */Ve-y 


fy) = y 
0 otherwise 


0<x<0,0<y<oo 


Find P{x >1|Y = y}. 
Solution 
We first obtain the conditional density of X given that Y = y. 


_f@y) 
fyjy y) — £0) 


e */Ye-V/y 
= ef, (A/y)e~*/¥ dx 


= 1 o-x/y 


y 


Hence, 


oo 1 

P{x>1|Y=y} =J, ede 
= — ex/y|* 
1 


= e t/y 


Example 5c The t-distribution 
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If Z and Y are independent, with Z having a standard normal distribution and Y 
having a chi-squared distribution with n degrees of freedom, then the random 
variable T defined by 


is said to have a t-distribution with n degrees of freedom. As will be seen in 
Section 7.8 __, the t-distribution has important applications in statistical 
inference. At present, we will content ourselves with computing its density 
function. This will be accomplished by using the conditional density of T given Y 
to obtain the joint density function of T and Y, from which we will then obtain the 
marginal density of T. To begin, note that because of the independence of Z and 
Y, it follows that the conditional distribution of T given that Y = y is the 
distribution of ,/n/yZ, which is normal with mean 0 and variance n/y. Hence, 
the conditional density of T given that Y = y is 


et y/o St < 00 


1 
Fry(€1Y) = 


Using the preceding, along with the following formula for the chi-squared density 
given in Example 3b __ of this chapter, 


@ Vieyn/2-1 
fy) = Sap 


2 T(nja’ > 


we obtain that the joint density of T,Y is 


1 2 
t, ee et y/2ng-y/2 (n—1)/2 
Fry@) V2nn 2"? T(n/2) 4 
1 t?7+07 
_ = 2 
“Teta 2 ee StS® 
Pan : . 
Letting c= ert and integrating the preceding over all y, gives 


fr) =So fpyGy)ay 
1 34 
— ev (n—-1)/2q 
Tmax Oty! - 4 
Or hyZ 


~ Vm 2 Dima! eee o dx (by letting x = cy) 


n+1 
(nta/2p( 27 * 
ou ( 2 : 1 2n 
pee, N (nt)/2(2\ Seen sot Pog 
van (t? +n) r= 


2 
r n+1 
2 


“ver 


p2\ ~t0/2 
(145) , —~o<t<o 


Example 5d The bivariate normal distribution 


One of the most important joint distributions is the bivariate normal distribution. 
We say that the random variables xX, Y have a bivariate normal distribution if, for 
constants Hayy Hy Ox > 0, o, > 0, -1<p <1, their joint density function is given, 


for all— 0 <x,y< o, by 


7 1 vate (*<!=) ? (—) 2 (x — H,)07 — Hy) 
Le) 210,0y./1— p? noe 2(1= p?) Ox Oy a Ox0y 


We now determine the conditional density of X given that Y = y. In doing so, we 
will continually collect all factors that do not depend on x and represent them by 
the constants C;. The final constant will then be found by using that 


| f xy ly) dx = 1. We have 


2 f(y) 
fy) 


= (,f(%y) 


1 x—u,\" | x(¥- Hy) 
= ©2°P 3 = 93) ea) caer 


1 ‘ Ox 
= eol-saxa mal a tem) 


1 : : 
—“ ool axa =p (Ht 0RO-m) | 


f xjy ly) 
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Recognizing the preceding equation as a normal density, we can conclude that 
given Y = y, the random variable X is normally distributed with mean 

L, +p a — 4) and variance o*,(1 — p*). Also, because the joint density of 
Y,X is exactly the same as that of X,Y, except that u,,0, are interchanged with 
My Oy, it similarly follows that the conditional distribution of Y given X = x is the 
normal distribution with mean y,, + p (x — p,,) and variance o7,(1 — p?). It 
follows from these results that the necessary and sufficient condition for the 
bivariate normal random variables X and Y to be independent is that p = 0 (a 
result that also follows directly from their joint density, because it is only when 
p = 0 that the joint density factors into two terms, one depending only on x and 
the other only on y). 


With C = —_———————., the marginal density of X can be obtained from 


200.0 ys/ 1 =p? 
fy) =S-. f(y) dy 


aes 1 CaN FPN. CHW) 
= S420 ~ sq aa|(S +( a Sie | dy 


Y-HL 
Now, with w = Z. 
Oy 
am cag ame 2p(x — Hy )(Y — Hy) 
Ox Oy Ox0y 
x- 2 2p(x — L,.)w 
ay eae ge p(x — Hy) 
x Ox 
pai). ad 
== =) += 9) 
Ox x 
; ba os ae 
Hence, making the change of variable w = yields that 
y 


=A ‘ 1 pen)? 
ne (x-U,)*/202 | Fe Ne 
fy@ ye J_,eXP Tap? (w : ) tdw 


2 


oe) Vv Ke 
= Gaye OM RY explo ha by letting v = w — p(x H) 


Ox 


= Ke~ by)” /20% 


where K does not depend on x. But this shows that X is normal with mean y.,. 
and variance o2. Similarly, we can show that Y is normal with mean My and 


429 of 848 


variance gs 


We can also talk about conditional distributions when the random variables are 
neither jointly continuous nor jointly discrete. For example, suppose that X isa 
continuous random variable having probability density function f and N is a discrete 
random variable, and consider the conditional distribution of X given that N =n. 
Then 


Pix << X (x+dx|N =n} 
ee 
P{N=n|x<X <x+dx} Pix<X<x+dx} 
~ P{N =n} dx 


and letting dx approach 0 gives 


Pix <X<x+dx|N=n} P{N=n|X=x} 
aac rc A ea 2 
dx >0 dx P{N = n} 


thus showing that the conditional density of X given that N = n is given by 


P{N =n|X = x} 


Sxl) = —Siyaay 


f(x) 


Example 5e 


Consider n + m trials having a common probability of success. Suppose, 
however, that this success probability is not fixed in advance but is chosen from a 
uniform (0, 1) population. What is the conditional distribution of the success 
probability given that the n + m trials result in n successes? 


Solution 


If we let X denote the probability that a given trial is a success, then X is a 
uniform (0, 1) random variable. Also, given that X = x, the n + m trials are 
independent with common probability of success x, so N, the number of 
successes, is a binomial random variable with parameters (n + m,x). Hence, the 
conditional density of X given that N = nis 


: P{N = n|X = x}f (x) 
P{N =n} 


« + ™) ee 
(As) 
= he it 
P{N =n} 


fgy@ln) 


0<x<l1 


= cx" (1 =x)" 
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where c does not depend on x. Thus, the conditional density is that of a beta 
random variable with parameters n+1,m+1. 


The preceding result is quite interesting, for it states that if the original or prior (to 
the collection of data) distribution of a trial success probability is uniformly 
distributed over (0, 1) [or, equivalently, is beta with parameters (1, 1)], then the 
posterior (or conditional) distribution given a total of n successes in n + m trials is 
beta with parameters (1 + n,1+m). This is valuable, for it enhances our 
intuition as to what it means to assume that a random variable has a beta 
distribution. 


We are often interested in the conditional distribution of a random variable X given 
that X lies in some set A. When X is discrete, the conditional probability mass 
function is given by 


Pa =exen,.|\I C=” areesn 
=X, SS ae Wx 
Pl = xX € A) = “A = PE A) 

0, if x EA 


Similarly, when X is continuous with density function f, the conditional density 
function of X given that X € A is 


Pe a fe _ FG) 
xlxea  P(XE A) ff (y)dy’ 


Example 5f 
A Pareto random variable with positive parameters a, 4 has distribution function 


F(x)=1-a4’x"4, x>a 


and density function 


f{@Q)=Aax hk Se 


An important feature of Pareto distributions is that for x) > a the conditional 
distribution of a Pareto random variable X with parameters a and 1, given that it 
exceeds xy, is the Pareto distribution with parameters x, and 2. This follows 
because 


ee eee a 
~ PAX Sx} ie ee 


=e bey eae aoe 


fxlxsxp © 


thus verifying that the conditional distribution is Pareto with parameters x) and Ad. 
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"6.6 Order statistics 


Let X,,X2,...,X, be n independent and identically distributed continuous random 
variables having a common density f and distribution function F'. Define 


X(4) = smallestof X,, X2, ..., Xn 


X(2) = second smallest of X,, X2, ..., Xn 


= jthsmallest of X,, Xz, ..., Xn 


x 
aie. 
~, 
> 


X(n) largest of X,, X>, ..., Xn 


The ordered values X(1) < X(2) S++: S X(m) are known as the order statistics 
corresponding to the random variables X,,X2,...,X,. In other words, X(4),...,X(n) are 
the ordered values of X,,...,Xy. 


The joint density function of the order statistics is obtained by noting that the order 
statistics X (4), ..., Xm) will take on the values x, < x2 < ++: < x, if and only if, for some 
permutation (i,,iz,...,i,) of (1,2,...,n), 


X4 = Xi, X2 = Xin» np Ay = Xin 


Since, for any permutation (i, ...,i,) of (1, 2,...,7), 


E E 
Phx, = 2 < XY < Xi4 + 2’ _ 
<7 iis a (Xi) +) Xin) 


=e" F(X, 7 Xs, 
= E"f (X1)--f (Xn) 


E 


E 
Xin =< Xn <i, + 5} 


it follows that, for x, <x. <+++< Xp, 


E 


éE E 
Phe 5 <Xay S41 tS etn 5 


Z 
= nle"f(x1)f (Xn) 


E 
< Xn) <Xy + =| 
Dividing by e” and letting « > 0 yields 


(6.1) 


To Res Fas eH nye). eee ee, 
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Equation (6.1) is most simply explained by arguing that, in order for the vector 
(Xa), +X mm ) to equal ( x1,...,x, ), itis necessary and sufficient for ( X4,...,Xn ) 
to equal one of the n! permutations of ( x,...,x, ) . Since the probability (density) 
that ( X,,...,X, ) equals any given permutation of ( x1,...,x» ) is just f(x,)---f(xn), 
Equation (6.1) follows. 


Example 6a 


Along a road 1 mile long are 3 people “distributed at random.” Find the probability 


1 
that no 2 people are less than a distance of d miles apart when d < 5° 


Solution 


Let us assume that “distributed at random” means that the positions of the 3 
people are independent and uniformly distributed over the road. If X; denotes the 
position of the ith person, then the desired probability is 

P{X Gi) > XG-1) + d,i = 2,3}. Because 


are om (X14,%X2,X3) = 3! 0 < X41 < X2 < X3 < 1 


it follows that 


P{X Gy >XqG-ytdiz= 2,3) = i emee Te hea (X%4,X2,X3) Ax, dxz dx3 


1-—2d -1-d 1 
= 31, Spee) cpeig C9 AN2 Oey 


1-2d -1-d 
= 6f, Sieg+q(l — d — x2) dx dxy 


1—2d -1—2d—-x 
= 6f, i *y, dy, ax, 


where we have made the change of variables y, = 1 — d— x2. Continuing the 
string of equalities yields 
= 3/5 ““(1-2d—x,)? dx, 
1-2d 
=3fy yp ay, 
= (1 — 2d)? 


Hence, the desired probability that no 2 people are within a distance d of each 
other when 3 people are uniformly and independently distributed over an interval 


1 
of size 1 is (1 — 2d)? whend < 5 In fact, the same method can be used to 


prove that when n people are distributed at random over the unit interval, the 
desired probability is 


1 
1 —(n- 1)ah whend < —— 
n—-1 


The proof is left as an exercise. 


The density function of the jth-order statistic X(;, can be obtained either by 
integrating the joint density function (6.1 _) or by direct reasoning as follows: In 
order for X,;) to equal x, it is necessary for j — 1 of the n values X,,...,X,, to be less 
than x,n — j of them to be greater than x, and 1 of them to equal x. Now, the 
probability density that any given set of j — 1 of the X;‘s are less than x, another 
given set of n — j are all greater than x, and the remaining value is equal to x equals 


[F(x) 7711 — FG)" IF) 


Hence, since there are 


( n )- n! 
j-i4n-j1)) @™-p!g-1)! 


different partitions of the n random variables X;,, ...,X,, into the preceding three 
groups, it follows that the density function of X,;) is given by 


(6.2) 


n! 
| 


maopige pire tt — F(x)]" Jf (x) 


fr = 


Example 6b 


When a sample of 2n + 1 random variables (that is, when 2n + 1 independent 
and identically distributed random variables) is observed, the (n + 1) smallest is 
called the sample median. |f a sample of size 3 from a uniform distribution over 


1 
(0, 1) is observed, find the probability that the sample median is between Z and 
3 
Zz 


Solution 


From Equation (6.2) __, the density of X(2) is given by 


3! 
Firg ©) = Tq *G - O0<x<1 


Hence, 


434 of 848 


1 3 3/4 
lt < Xe = ;| = 6), ax@—2) dx 
x=3/4 


x? x? 
6-3, 
x= 


The cumulative distribution function of X_;, can be found by integrating Equation 
(6.2) .Thatis, 


11 


1/4 16 


(6.3) 


y 
| re ey 
Fx) = Ca ESTT | [F@)]’ *[1 — F@)]” “F@) ax 


However, Fx) (y) could also have been derived directly by noting that the jth order 


statistic is less than or equal to y if and only if there are j or more of the X;‘s that are 
less than or equal to y. Thus, because the number of X;‘s that are less than or equal 
to y is a binomial random variable with parameters n,p = F(y), it follows that 


(6.4) 
Fy) = PAX jy S y} = P{j or more of the X;'sare < y} 


=> (j)tronitts - FonI"* 
k= j 


If, in Equations (6.3) | and(6.4) _ , we take F to be the uniform (0, 1) distribution 
[that is, f(x) = 1,0 < x < 1], then we obtain the interesting analytical identity 


(6.5) 
n , a 
: n-k _ Nn: a st 
ZX (ey -a=po=n| xJ (a x) J dx O0<y<1 


We can employ the same type of argument that we used in establishing Equation 
(6.2) _ to find the joint density of X,;, and X,;), the i” and j‘" smallest of the values 
X1,..,Xn. For suppose that i < j and x; < x;. Then the event that X(;) = x;,X(j) = x; 
is equivalent to the event that the n data values can be divided into 5 groups of 
respective sizes i—1, 1, j-—i-—1, 1, n—j, that satisfy the condition that all i— 1 
members of the first group have values less than x;, the one member of the second 
group has value x;, all j — i— 1 members of the third group have values between x; 
and x;, the one member of the fourth group has value x,;, and all n — j members of 
the last group have values greater than x; . Now, for any specified division of the n 
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values into 5 such groups, the preceding condition will hold with probability (density) 


jri-1 


Fla) f(a) (F(x) — FQ)? fj) (1-FO))” ? 


n! 
As th mM Pee 
s there are G-DNIG -i- Dim —)! such divisions of the n values, and as the 
condition cannot hold for more than one of these divisions, it follows, for 


i<j, x; < x;, that 


(6.6) 
Pix yxy) Ce %H) = 


n! 


TDG aiaDiaa pi @or@alF@) - Fea “F@plt-Fe@ay”™ 


Example 6c Distribution of the range of a random sample 


Suppose that n independent and identically distributed random variables 
X1,X2,..,Xpn are observed. The random variable R defined by R = Xn) — X 1) is 
called the range of the observed random variables. If the random variables X; 
have distribution function F and density function f, then the distribution of R can 
be obtained from Equation (6.6) as follows: Fora = 0, 


P{R<a}o =P{Xq@)—-Xqa) <a} 


a fi 2 I phe (X14, Xn) dx, aX 
X,>a 


xn 


el ae Sy; (FC) - F(x)" "f(e1) fn) den dy 


Making the change of variable y = F(x,,) — F(x,),dy = f(xy) dx, yields 
eR Gen) — Fay) Fn) btn = [GOO PY yn-2ady 


1 a 
= [Fon +) — Fa)" 


P{R<a}=n | [F(x + a) — F(x,)]” “f(x1) dx, 


Equation (6.7) | can be evaluated explicitly only in a few special cases. One 


such case is when the X;,’s are all uniformly distributed on (0, 1). In this case, we 
obtain, from Equation (6.7) __, that for0 <a <1, 


P{R<a} =nfo[F@. +a) —F(xy]” “fede 
= nf, a™-1dx, -- nf,_,(1 — x)" * dx, 


=n(1—a)a™ 1+a” 
Differentiation yields the density function of the range: given in this case by 


_ n-2(4 — 
£= - 1)a"’ “(1 —-a) O<a<l 


otherwise 


That is, the range of n independent uniform (0, 1) random variables is a beta 
random variable with parameters n — 1,2. 


6.7 Joint Probability Distribution of Functions 
of Random Variables 


Let X; and X, be jointly continuous random variables with joint probability density 
function fax, . Itis sometimes necessary to obtain the joint distribution of the 
random variables Y, and Y,, which arise as functions of X, and X,. Specifically, 
suppose that Y; = g,(X1,X2) and Y, = g,(X1,X2) for some functions g, and g,. 


Assume that the functions g, and g, satisfy the following conditions: 


1. The equations y, = g,(x1,x2) and y, = g,(x1,x2) can be uniquely solved for 
x, and x2 in terms of y, and y,, with solutions given by, say, 
x1 = Ay (Vy V2), %2 = h2(Vy, V2)- 
2. The functions g, and g, have continuous partial derivatives at all points 
(x1,X2) and are such that the 2 x 2 determinant 
99, 294 
Ox, Ox, 0g, 0 0g, 0 
Mera, cag |" On ge, Oe, Oe, 


Ox, Ox, 
at all points (x1, x2). 


Under these two conditions, it can be shown that the random variables Y, and Y, are 
jointly continuous with joint density function given by 
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(7.1) 


=1 
fay, (12) = fea x, (%1,%X2) | 1ee1,2¢2)| 


where x; = h1(V,, Y2), X2 = A219.) - 
A proof of Equation (7.1) | would proceed along the following lines: 


(7.2) 


PLY, Sy,,¥%2 Sy,}= {| fxs .x, (1X2) dry dxy 
(xqx%z): 


9,(%1%2) < Vy 


Gale arts) < Vo 


The joint density function can now be obtained by differentiating Equation (7.2) 

with respect to y, and y, . That the result of this differentiation will be equal to the 
right-hand side of Equation (7.1) is an exercise in advanced calculus whose proof 
will not be presented in this book. 


Example 7a 


Let X, and X, be jointly continuous random variables with probability density 
function fy, Xp" LetY, = X,+X2,Y2 =X, -—X,. Find the joint density function of 


Y, and Y, in terms of fy, Xp" 


Solution 


Let 9, (%1,X2) = x1 + x2 and g,(x1,X%2) =X, — Xz. Then 


1 
Xi Xs) = = —-2 
ea a 


Also, since the equations y, = x, + x, and y, = x; — x, have 
x1 = (¥, + y,)/2, x2 = (v, — y,)/2 as their solution, it follows from Equation 
(7.1) that the desired density is 


_l Vit V2 ms 
fysv, VV) — 5 x 7) , 2 ) 


For instance, if X, and X, are independent uniform (0, 1) random variables, then 
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1 
= O<y,+y,<2,0<y,-y,<2 

fysy, V2) = 2 : : : ; 
0 


otherwise 


or if X; and X, are independent exponential random variables with respective 
parameters A, and A,, then 


fy v, VV) 


AA + = 
2B of-n(2E22)-— 225%) 9,4 20y,-m 20 


otherwise 


Finally, if X¥,; and X, are independent standard normal random variables, then 


fyiy, VV) =e [71 +¥72)7/8+ 04 —¥2)"/8] 
142 TU 


= i e701 +y72)/4 
Att 

= i Pe acl —Z e¥?2/4 
V4 V4 


Thus, not only do we obtain (in agreement with Proposition 3.2 _) that both 
X, +X, and X, — Xz are normal with mean 0 and variance 2, but we also 
conclude that these two random variables are independent. (In fact, it can be 
shown that if X, and X, are independent random variables having a common 
distribution function F, then X, + Xz will be independent of X, — X, if and only if 
F is anormal distribution function.) 


Example 7b 


Let (x, Y) denote a random point in the plane, and assume that the rectangular 
coordinates X and Y are independent standard normal random variables. We are 
interested in the joint distribution of R, 0, the polar coordinate representation of ( 
x, y). (See Figure 6.4 _ .) 


Figure 6.4 « = Random point. (X,Y) = (R, 0). 


Suppose first that X and Y are both positive. For x and y positive, letting 
r= ,(x%,y) =x? + y? and @ = g,(x,y) = tan *y/x, we see that 


09, _ x 

ax x2 + y2 

09, y 

dy afar aay 

99, -—+ _(2)-_, 
Ox 1 + (y/x)? \ x? x2 + y? 
09, 1 x 


ay xf +o] ty? 


Hence, 
on ag ‘ 6 de 1 1 
x, — Ee EE eee 
y (x? + y2)3/2 (x? + y2)3/2 [x24 y2 1 
Because the conditional joint density function of X, Y given that they are both 
positive is 


f(xy) a Tee 
y|X >0,Y > 0) = ———— = - e 8/2 x > Oy>0 
fy DFS YS0) 2° eas 


we see that the conditional joint density function of R = /X*+Y? and 
@ = tan’ +(Y/X), given that X and Y are both positive, is 
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y2/2 


2 
for.9|X >0,¥ > 0) =<re- , 0<O0<7/2, 0<r<aw 


Similarly, we can show that 


2/2 


2 
fcri0|X <0,¥ > 0) ==re* yo FO 2 On, 0<r<oa 


—72/2 


Z 
f(r.9|X <0, <0) ==re , W<O<3n/2, O<r<o 


y2/2 


Z 
fori |X > 0,¥ <0) == re" , 3n/2<A0<2n, 0<r<o 


As the joint density is an equally weighted average of these four conditional joint 
densities, we obtain that the joint density of R, @ is given by 


1 _.2 
iG eae re ti 0<6<2n, 0<r<o 


Now, this joint density factors into the marginal densities for R and 0, so R and @ 
are independent random variables, with 0 being uniformly distributed over (0, 277) 
and R having the Rayleigh distribution with density 


fare? O<r<eo@ 


(For instance, when one is aiming at a target in the plane, if the horizontal and 
vertical miss distances are independent standard normals, then the absolute 
value of the error has the preceding Rayleigh distribution.) 


This result is quite interesting, for it certainly is not evident a priori that a random 
vector whose coordinates are independent standard normal random variables will 
have an angle of orientation that not only is uniformly distributed, but also is 
independent of the vector’s distance from the origin. 


If we wanted the joint distribution of R? and @, then, since the transformation 
d= g,(%,y) =x?+y? and 6 = g,(x,y) = tan ty/x has the Jacobian 


2x 2y 


it follows that 


Ae oe Oh 
f(d,0) =5e ae OS di eax, 0<60<2n 


Therefore, R* and @ are independent, with R? having an exponential distribution 
with parameter . But because R? = X* + Y”, it follows by definition that R* has 
a chi-squared distribution with 2 degrees of freedom. Hence, we have a 
verification of the result that the exponential distribution with parameter 5 is the 


same as the chi-squared distribution with 2 degrees of freedom. 


The preceding result can be used to simulate (or generate) normal random 
variables by making a suitable transformation on uniform random variables. Let 
U, and U, be independent random variables, each uniformly distributed over (0, 
1). We will transform U,, U, into two independent standard normal random 
variables X, and X, by first considering the polar coordinate representation (R, 0) 
of the random vector (X;,X,). From the preceding, R* and @ will be 
independent, and, in addition, R* = X*, + X?, will have an exponential 


1 
distribution with parameter A = 5" But —2 log U, has such a distribution, since, 


for x > 0, 


x 
P{—2logU, < x} = PllogU, > -5| 
= P{U, >e*/?} 


=1-e¢%/2 


Also, because 27 U2 is a uniform (0, 27) random variable, we can use it to 
generate 0. That is, if we let 


R? = —2logU, 
@ = 2nU, 


then R? can be taken to be the square of the distance from the origin and 6 can 
be taken to be the angle of orientation of (X,,X,). Now, since 
X, =Rcos 0,X, = Rsin @, it follows that 


X, = ./—2logU, cos (2mU2) 
X> = ./—2logU, sin (2nU2) 


are independent standard normal random variables. 


Example 7c 


If X and Y are independent gamma random variables with parameters (a, 2) and 
(B,A), respectively, compute the joint density of U = X+YandV=X/(X+Y). 


442 of 848 


Solution 


The joint density of X and Y is given by 


de~**(Ax)*~* Ae~47(Ay) 8? 


Pxv@) = Taq) T®) 
= salad e At) ya-1 B-1 
~ T(@)P(B) 4 


Now, if g,(%~y) =x+y,9,(%y) =x/(x+y), then 


Ws 2 OU og OUR. PO 
Ox dy dx (x+y)* dy (x+y)? 
SO 
1 1 
1 
I(%y) = y —* aon 


@w+y? ty) **” 


Finally, as the equations u = x+ y,v = x/(x+y) have as their solutions 
x= uv,y = u(1—v), we see that 


fyuy@yv) = fyyluvy,u( —v)lu 
de~™ (Au) **F~* ye-101 — vy)? PCa + B) 
(a + p) P(a)r(B) 


Hence, X + Y and X/(X + Y) are independent, with X + Y having a gamma 
distribution with parameters (a + B,A) and X/(X + Y) having a beta distribution 
with parameters (a, 8). The preceding reasoning also shows that B(a, PB), the 
normalizing factor in the beta density, is such that 


B(@,B) = Sjvt 10 —v)8 tap 


_ T(@re) 
~ T(at+ B) 


This entire result is quite interesting. For suppose there are n + m jobs to be 
performed, each (independently) taking an exponential amount of time with rate A 
to be completed and suppose that we have two workers to perform these jobs. 
Worker I will do jobs 1, 2,...,n, and worker II will do the remaining m jobs. If we let 
X and Y denote the total working times of workers | and II, respectively, then 
(either from the foregoing result or from Example 3b) X and Y will be 
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independent gamma random variables having parameters (n,/) and (m, A), 
respectively. It then follows that independently of the working time needed to 
complete all n + m jobs (that is, of X + Y), the proportion of this work that will be 
performed by worker | has a beta distribution with parameters (n, m). 


When the joint density function of the n random variables X,,X2,...,X» iS given 
and we want to compute the joint density function of Y,,Y2,...,Y,, where 


Y¥, = 9,4 Xn) Yo = 9, Xn Xn) Yn = 9,41 Xn) 


the approach is the same—namely, we assume that the functions g, have 
continuous partial derivatives and that the Jacobian determinant 


OG. OS 08s 
OX, OX2 OXn 
09, 99, 09, 

J] (4) Xp) = Ox, Oxy oes Xn #0 
99n 99n  99n 
OX, OX2 OXn 


at all points (x,,...,x,,). Furthermore, we suppose that the equations 

V1. = Gar Xn) Vq = Gy Xr Xn)y Vn = Gn (1+ Xn) have a unique 
solution, say, X1 = hy (V4, ¥, Xn = Mn+ Y,,)- Under these 
assumptions, the joint density function of the random variables Y; is given by 


(7.3) 


fy Yn Op an Vn) — fgs, Xn (%1, wept) I(%1, Br | 


1 
where x; = hi(V,)--¥,)t = 1,2,...,7. 


Example 7d 


Let X;,,X2, and X3 be independent standard normal random variables. If 
Y, =X,+X24+X3,Y2 =X, —Xz, and Y3 = X, — X3, compute the joint density 
function of Y,,Y2,Y3. 


Solution 


Letting Y; =X, +X, + X3,Y2 =X, —X2,Y3 = X, — X3, the Jacobian of these 
transformations is given by 
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As the preceding transformations yield that 


Vie P¥o+Ys Yo —2Y, 475 ¥, +3 ~275 
a a i ios an: 


we see from Equation (7.3) that 


Frys ¥o.¥3 V1 Yo V3) 


=if Yat Vat Vy Vy 2, 4+V4 Yet Vo— Bs 
3 X1,X2,X3 3 , 3 , 2 
Hence, as 
3 
a . x7; /2 
fxs, Xo,Xs (*1, X2, x3) = (ny3/?° Dex 
we see that 
: —Q(V 4) V2,V3)/2 
Py, Y2, vy, V1 Vor Vg) = 3(2m)3/2 * 1 V21¥3 
where 
QV V3) 
2 2 
METIS) (I) US Ds 
3 a 
y74 2 2 
=" 459 aaa 3 3Y2Y3 
Example 7e 


Let X,,X>2,...,X, be independent and identically distributed exponential random 
variables with rate 2. Let 


¥,=X,te+X; t=1,.,7n 
a. Find the joint density function of Y;,...,Y,- 


b. Use the result of part (a) to find the density of Y,,. 
c. Find the conditional density of Y,,...,Y,—, given that Y, = t. 


445 of 848 


Solution 


a. The Jacobian of the transformations Y, = X,, Y2 =X,+ Xz, 
Y 5 =X, Se +Xn is 


wany 


111 1+ 1 


Since only the first term of the determinant will be nonzero, we have 
J =1. Now, the joint density function of X,,...,X, is given by 


n 
Fa Xp tr oe Xn) = | | Ae Pe USS Oy TS Tye 


i=1 


Hence, because the preceding transformations yield 
XxX = Y,,X2 = Y> =. Y;, ey AG _ Y; — Yi-1 nighy = Yn nal Yn-1 


it follows from Equation (7.3) that the joint density function of Y;, ... 
is 
fy %_ Ov “iy Ve) = fea X_ Ov V2 = Vale =) a4) 


n 
= A™exp;—Aly, + ». = 955) 


i=2 
=1"e Wn O< UNH Ve ak = Zia 


=A"e*"n O<y, <y, <0 <y, 


b. To obtain the marginal density of Y,,, let us integrate out the other 
variables one at a time. Doing this gives 


y = 
fy, %_ 02" io) = Jy?ane “Yndy, 


=A"y,e-"n O<y,<yy<e<y 


n 


Continuing, we obtain 


_ (¥34n =| 
Fyg,¥_ Var Vn) =f, y2e Yndy, 


2 
=A" Ben 0<y,<y¥,<--<y 


n 


The next integration yields 
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3 
_ 4n¥4 a 
Pg ¥_ Ov it Vn) =A 31° Yn O<y,<<y, 


Continuing in this fashion gives 


which, in agreement with the result obtained in Example 3b _—, shows 
that X, +--- +X, is a gamma random variable with parameters n and J. 
c. The conditional density of Y,,...,Y,—, given that Y,, = t is, for 
O<y,<..<y,_, <6 
Frys Yn Ln Oy “Vay t) 


Fy u¥n—1l¥n OV “Vn l) = fy, © 
Anenat 
~ AeA)” */(n— 1! 
_ (n=)! 
~~ tr-1 


Because f(y) = 1/t, 0< y<t, is the density of a uniform random variable on 
(0,t), it follows that conditional on Y,, = t, Yy,...,Yn—1 are distributed as the order 
statistics of n — 1 independent uniform (0, t) random variables. 


"6.8 Exchangeable Random Variables 


The random variables X,,X>,...,X, are said to be exchangeable if, for every 
permutation i,,...,i, of the integers 1, ...,n, 


PIX; Sid Shp og Me Sy HP She S ie Kp SX} 


for all x1,...,x,. That is, the n random variables are exchangeable if their joint 
distribution is the same no matter in which order the variables are observed. 


Discrete random variables will be exchangeable if 


PIX, = %y Xi, = Xp Xp, =X pn} = PLN, = Hy, Xz = X gp Xp = Xp} 


in 
for all permutations i,,...,i,, and all values x,,...,x,. This is equivalent to stating that 
P(X) Xp) 00) Xy) = P{X, = Xy,..,Xpn = Xp} iS a symmetric function of the vector 

(x1, .-,X,), which means that its value does not change when the values of the 
vector are permuted. 
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Example 8a 


Suppose that balls are withdrawn one at a time and without replacement from an 
urn that initially contains n balls, of which k are considered special, in such a 
manner that each withdrawal is equally likely to be any of the balls that remains 
in the urn at the time. Let X; = 1 if the ith ball withdrawn is special and let X; = 0 
otherwise. We will show that the random variables X,,...,X,, are exchangeable. 
To do so, let (x1, ...,X,) be a vector consisting of k ones and n — k zeros. 
However, before considering the joint mass function evaluated at (x,,...,x,), let 
us try to gain some insight by considering a fixed such vector for instance, 
consider the vector (1,1,0,1,0,...,0,1), which is assumed to have k ones and 

n —k zeros. Then 


kKk-1in-kk-—-2n-k-1 11 
p(1,1,0,1,0,...,0,1) = ———- ——- ——- ——_...._ — 


which follows because the probability that the first ball is special is k/n, the 
conditional probability that the next one is special is (k — 1)/(n— 1), the 
conditional probability that the next one is not special is (n — k)/(n — 2), and so 
on. By the same argument, it follows that p(x,,...,x,) can be expressed as the 
product of n fractions. The successive denominator terms of these fractions will 
go from n down to 1. The numerator term at the location where the vector 
(x1,.-,X,) is 1 for the ith time is k — (i— 1), and where it is 0 for the ith time it is 
n—k—(i-—1). Hence, since the vector (x, ...,x,) consists of k ones and n—k 
zeros, we obtain 


ki(n—k)! . 
P(X1,+.)Xn) = sr x; = 0,1, ». x,=k 


Since this is a symmetric function of (x1, ...,X,), it follows that the random 
variables are exchangeable. 


Remark Another way to obtain the preceding formula for the joint probability mass 
function is to regard all the n balls as distinguishable from one another. Then, since 
the outcome of the experiment is an ordering of these balls, it follows that there are n 
! equally likely outcomes. Finally, because the number of outcomes having special 
and nonspecial balls in specified places is equal to the number of ways of permuting 


the special and the nonspecial balls among themselves, namely k!(n — k)!, we 
obtain the preceding mass function. 


It is easily seen that if X,,X>,...,X, are exchangeable, then each X; has the same 
probability distribution. For instance, if X and Y are exchangeable discrete random 
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variables, then 


PX =2)= ) Pik =xYay}= ) Pixay, Yad =ryax 
y y 


For example, it follows from Example 8a __ that the ith ball withdrawn will be special 
with probability k/n, which is intuitively clear, since each of the n balls is equally 
likely to be the ith one selected. 


Example 8b 


In Example 8a __, let Y, denote the selection number of the first special ball 
withdrawn, let Y, denote the additional number of balls that are then withdrawn 
until the second special ball appears, and, in general, let Y; denote the additional 
number of balls withdrawn after the (i — 1) special ball is selected until the ith is 
selected, i = 1,...,k. For instance, ifn = 4,k = 2 and 

X; = 1,X2 = 0,X3 = 0,X, = 1, then Y, = 1,Y, = 3. Now, 

¥, = i1,Y2 = ig, Vp Hie O Xi, = Xi tig = = Xie ti, = LX; = 9, 


otherwise; thus, from the joint mass function of the X;, we obtain 


ki(n — k)! 


P{Y, = 14,Y2 = Ip, sey Y Be = ix} — a 


Hence, the random variables Y,, ..., ¥,; are exchangeable. Note that it follows 
from this result that the number of cards one must select from a well-shuffled 
deck until an ace appears has the same distribution as the number of additional 
cards one must select after the first ace appears until the next one does, and so 
on. 


Example 8c 


The following is known as Polya’s urn model: Suppose that an urn initially 
contains n red and m blue balls. At each stage, a ball is randomly chosen, its 
color is noted, and it is then replaced along with another ball of the same color. 
Let X; = 1 if the ith ball selected is red and let it equal 0 if the ith ball is blue, 
i=1. To obtain a feeling for the joint probabilities of these X;, note the following 
special cases: 


P{X, a1 X27] 126202, = 14. =) 


n+mn+m+1n+m+2 n+m+3 n+m+4 
_ n(n+1)(n+ 2)m(m + 1) 
~ (ntm)\(n+m+1)(n+m+2)\(n+mt+3)\(n+m+4) 
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and 


PIX. SO Xe = 1 ke = OX) SH 1 Xe = Th 
m n m+1 n+1 n+2 


n+mn+mt+1n+m+2 n+m+3n+m+4 
7 n(n +1)(n + 2)m(m + 1) 
~ (ntm)\(n+m+i1)(n+m+4+2)\(n+mt+3)\(n+m+4) 


By the same reasoning, for any sequence xj, ...,x, that contains r ones and 
k —r zeros, we have 


P{X, = Kajesy Nip ih 
_nnt1)(ntr—1)m(m + 1)(m+k-r—1) 
7 (n+m):--(n+m+k-—1) 


Therefore, for any value of k, the random variables X,, ...,X;, are exchangeable. 


An interesting corollary of the exchangeability in this model is that the probability 


that the ith ball selected is red is the same as the probability that the first ball 
n 
selected is red, namely, aaa (For an intuitive argument for this initially 


nonintuitive result, imagine that all the n + m balls initially in the urn are of 
different types. That is, one is a red ball of type 1, one is a red ball of type 2, ..., 
one is a red ball type of n, one is a blue ball of type 1, and so on, down to the 


blue ball of type m. Suppose that when a ball is selected it is replaced along with 
another of its type. Then, by symmetry, the ith ball selected is equally likely to be 


of any of the n + m distinct types. Because n of these n + m types are red, the 


-) 


probability is ees 


Our final example deals with continuous random variables that are exchangeable. 


Example 8d 


Let X,,X2,...,X, be independent uniform (0, 1) random variables, and denote 
their order statistics by X(4),...,X(m). That is, X(j;) is the jth smallest of 
X1,X2,..,Xn. Also, let 


Y; = Xa), 
¥;=Xqw@ —-XG-w, i=2,..n 


Show that Y,,...,¥, are exchangeable. 


Solution 
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The transformations 


¥, =X Y; = Xi — Xi-1, i=2,..,n 


yield 
x =y tity, b=1 un 


As it is easy to see that the Jacobian of the preceding transformations is equal to 
1, so, from Equation (7.3) —, we obtain 


Fry tn Var Var Vn) = FOI y + Vy to + Vy) 


1 


where f is the joint density function of the order statistics. Hence, from Equation 
(6.1) , we obtain that 


ti Opie JHE US sty, 8 Sy ey, =) 


1> 
or, equivalently, 


ya .%_ Vv Ve w JY Jan O<y,<Lig=lLan yy, te +y, <1 


Because the preceding joint density is a symmetric function of y,,...,y,,, we see 
that the random variables Y,,...,Y, are exchangeable. 


Summary 


The joint cumulative probability distribution function of the pair of random variables X 
and Y is defined by 


Fee, y) =P{X<Sx,¥<y} - 0 <xy< © 


All probabilities regarding the pair can be obtained from F'. To find the individual 
probability distribution functions of X and Y, use 


Fx(x) =) lim, Fy) Fy(y) =, lim, Fy) 


If X and Y are both discrete random variables, then their joint probability mass 
function is defined by 


pli, f) = P{X =LY = j} 


The individual mass functions are 


PX==) pb) PY=j=) ped 
j i 


The random variables X and Y are said to be jointly continuous if there is a function f 
(x, y), called the joint probability density function, such that for any two-dimensional 
set C, 


P{(X,Y) € C} = | [ro dx dy 
C 


It follows from the preceding formula that 


P{xn<X<x+dx,y<Y<y+dy} = f(x,y) dxdy 


If X and Y are jointly continuous, then they are individually continuous with density 
functions 


(oe) 


7,0) | f(xy ay or= | f(xy) a 


The random variables X and Y are independent if, for all sets A and B, 
P{x € A,Y € B} = P{X € A}P{Y € B} 


If the joint distribution function (or the joint probability mass function in the discrete 
case, or the joint density function in the continuous case) factors into a part 
depending only on x and a part depending only on y, then X and Y are independent. 


In general, the random variables X,, ...,X,, are independent if, for all sets of real 
numbers Aj,..., An, 


P{X, € Ay, ...,Xn © An} = P{X, € Az}-P{X, € An} 


If X and Y are independent continuous random variables, then the distribution 
function of their sum can be obtained from the identity 


Fron = | Fx(a—- y)fy ondy 
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If X;,i = 1,...,n, are independent normal random variables with respective 
n n 

parameters yu, and oti = 1,...,n, then > X; is normal with parameters > L; 

i=1 i=1 

n 
and > a’, . 
i=1 
If X;,i = 1,...,n, are independent Poisson random variables with respective 
n n 
parameters 1;,i = 1,...,n, then = X; is Poisson with parameter ». Ad; - 
i=1 i=1 

If X and Y are discrete random variables, then the conditional probability mass 
function of X given that Y = y is defined by 


p(x, y) 
Py) 


P{X=x|Y=y}= 


where p is their joint probability mass function. Also, if X and Y are jointly continuous 
with joint density function f, then the conditional probability density function of X 
given that Y = y is given by 


fy) 
fy) 


fay@lyys 


The ordered values X(4) < X(2) S++ S Xm) Of a Set of independent and identically 
distributed random variables are called the order statistics of that set. If the random 
variables are continuous and have density function f, then the joint density function 
of the order statistics is 


f X10 Xn) = mI (1) + fn) XySxX2S SXy 


The random variables X,,...,X, are called exchangeable if the joint distribution of 


xX .,X;, is the same for every permutation i,,...,i,, Of 1,...,7. 


iyo in 


Problems 


6.1. Two fair dice are rolled. Find the joint probability mass function 
of X and Y when 
a. X is the largest value obtained on any die and Y is the sum of 
the values; 
b. X is the value on the first die and Y is the larger of the two 
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values; 
c. X is the smallest and Y is the largest value obtained on the 
dice. 


6.2. Suppose that 3 balls are chosen without replacement from an 
urn consisting of 5 white and 8 red balls. Let X; equal 1 if the ith ball 
selected is white, and let it equal 0 otherwise. Give the joint 
probability mass function of 


a. X1,X2; 
b. X1,X2,X3. 


6.3. In Problem 8 __, suppose that the white balls are numbered, 
and let Y; equal 1 if the ith white ball is selected and 0 otherwise. 
Find the joint probability mass function of 


a. Y1,Y2; 
b. Y,,Y2,Y3. 


6.4. Repeat Problem 6.2 — when the ball selected is replaced in the 
urn before the next selection. 
6.5. Repeat Problem 6.3a —_ when the ball selected is replaced in 
the urn before the next selection. 
6.6. The severity of a certain cancer is designated by one of the 
grades 1, 2,3,4 with 1 being the least severe and 4 the most severe. 
If X is the score of an initially diagnosed patient and Y the score of 
that patient after three months of treatment, hospital data indicates 
that p(i, 7) = P(X =i,Y = j) is given by 
p(i,1) =.08, pd, 2) =.06, p(1,3) =.04, p(1,4) =.02 
p(2,1) = .06, p(2,2)=.12, p(2,3) =.08, p(2,4) =.04 
p(3,1) = .03, p(3,2)=.09, p(3,3)=.12, p(3,4) =.06 
p(4,1) =.01, p(4,2) =.03, p(4,3)=.07, p(4,4) =.09 


a. Find the probability mass functions of X and of Y; 
b. Find E[X] and E[Y]. 
c. Find Var (X) and Var (Y). 


6.7. Consider a sequence of independent Bernoulli trials, each of 
which is a success with probability p. Let X, be the number of 
failures preceding the first success, and let X, be the number of 
failures between the first two successes. Find the joint mass function 
of X, and X,. 

6.8. The joint probability density function of X and Y is given by 
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fy) =co?-x%)Je% -y<xsy,0<y<o 


a. Find c. 
b. Find the marginal densities of X and Y. 
c. Find E[X]. 


6.9. The joint probability density function of X and Y is given by 
6 x 
fey) =2(x7+ =) O<x<1L0<y<2 


a. Verify that this is indeed a joint density function. 
b. Compute the density function of X. 
c. Find P{x > Y}. 


d. Find Ply > 21x : 
. FIN 2 a5: 


e. Find E[X]. 
f. Find E[Y]. 


6.10. The joint probability density function of X and Y is given by 
fy)=e Ft) O0<x< 0,0<y< © 


Find (a) P{X < Y} and (b) P{X < a}. 

6.11. In Example 1d ___, verify that f(x,y) = 2e *e- 2”, 0<x< o, 
0<y< oo, is indeed a joint density. function. That is, check that 
f(x,y) = 0, and that f° f°. fw y)dxdy = 1. 

6.12. The number of people who enter a drugstore in a given hour is 
a Poisson random variable with parameter 2 = 10. Compute the 
conditional probability that at most 3 men entered the drugstore, 
given that 10 women entered in that hour. What assumptions have 
you made? 

6.13. Aman and a woman agree to meet at a certain location about 
12:30 P.M. If the man arrives at a time uniformly distributed between 
12:15 and 12:45, and if the woman independently arrives at a time 
uniformly distributed between 12:00 and 1 P.M., find the probability 
that the first to arrive waits no longer than 5 minutes. What is the 
probability that the man arrives first? 

6.14. An ambulance travels back and forth at a constant speed along 
a road of length L. At a certain moment of time, an accident occurs 
at a point uniformly distributed on the road. [That is, the distance of 
the point from one of the fixed ends of the road is uniformly 
distributed over (0, L).] Assuming that the ambulance’s location at the 
moment of the accident is also uniformly distributed, and assuming 
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independence of the variables, compute the distribution of the 
distance of the ambulance from the accident. 
6.15. The random vector (X, Y) is said to be uniformly distributed 
over a region R in the plane if, for some constant c, its joint density is 
Cc if(x, vy) ER 
fay) = ‘ 


otherwise 


a. Show that 1/c = areaofregion R. 
Suppose that (X, Y) is uniformly distributed over the square 
centered at (0, 0) and with sides of length 2. 

b. Show that X and Y are independent, with each being 
distributed uniformly over (— 1,1). 

c. What is the probability that (X, Y) lies in the circle of radius 1 
centered at the origin? That is, find P{x” + Y? < 1}. 


6.16. Suppose that n points are independently chosen at random on 
the circumference of a circle, and we want the probability that they all 
lie in some semicircle. That is, we want the probability that there is a 
line passing through the center of the circle such that all the points 
are on one side of that line, as shown in the following diagram: 


Let P,,...,P, denote the n points. Let A denote the event that all the 
points are contained in some semicircle, and let A; be the event that 
all the points lie in the semicircle beginning at the point P; and going 
clockwise for 180°,i = 1,...,n. 

a. Express A in terms of the 4;. 

b. Are the A; mutually exclusive? 

c. Find P(A). 
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6.17. Three points X,,X2,X3 are selected at random on a line L. 
What is the probability that X, lies between X, and X3? 
6.18. Let X, and X, be independent binomial random variables with 
X; having parameters (nj;,p,), i= 1,2. Find 

a. P(X,X, = 0); 

b. P(X, + Xz = 1); 

c. P(X, +X, =2). 


6.19. Show that f(x,y) =1/x,0<y <x <1, is a joint density 
function. Assuming that f is the joint density function of X,Y find 
a. the marginal density of Y; 
b. the marginal density of x; 
c. E[X]; 
d. E[Y ]. 


6.20. The joint density of X and Y is given by 


—(xt+y) 
xe x>0,y>0 
f{a%y) = A 

0 otherwise 


Are X and Y independent? If, instead, f(x, y) were given by 


2 O0<x<y,0<y<l1 
fy) = 
0 otherwise 


would X and Y be independent? 
6.21. Let 
f(xy) =24xy O<x<1,0<y<10<xt+y<1 


and let it equal 0 otherwise. 
a. Show that f(x y) is a joint probability density function. 
b. Find E[X]. 
c. Find E[Y]. 


6.22. The joint density function of X and Y is 
x+y O0<x<10<y<l1l 
Fay) = 0 otherwise 


a. Are X and Y independent? 
b. Find the density function of xX. 
c. Find P{X +Y <1}. 


6.23. The random variables X and Y have joint density function 
fy) =12xy(1-x) 0<x<10<y<l1 
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and equal to 0 otherwise. 
a. Are X and Y independent? 
. Find E[X]. 
. Find E[Y]. 
. Find Var(X). 
. Find Var(Y). 


onan FD 


6.24. Consider independent trials, each of which results in outcome 
k 


i,i = 0,1,..., with probability Pir 3 p, = 1. Let N denote the 
i=o0 

number of trials needed to obtain an outcome that is not equal to 0, 
and let X be that outcome. 

a. Find P{N = n},n=>1. 

b. Find P{X = j},j =1,...,k. 

c. Show that P{N = n,X = j} = P{N = n}P{X = j}. 

d. Is it intuitive to you that N is independent of X? 

e. Is it intuitive to you that X is independent of N? 


6.25. Suppose that 10° people arrive at a service station at times that 
are independent random variables, each of which is uniformly 
distributed over (0,10°). Let N denote the number that arrive in the 
first hour. Find an approximation for P{N = i}. 
6.26. Suppose that A, B, C, are independent random variables, each 
being uniformly distributed over (0, 1). 

a. What is the joint cumulative distribution function of A, B, C? 

b. What is the probability that all of the roots of the equation 

Ax? + Bx + C = 0 are real? 


6.27. If X; and Xz are independent exponential random variables 
with respective parameters A, and A,, find the distribution of 
Z = X,/X,. Also compute P{X, < X}. 
6.28. The time that it takes to service a car is an exponential random 
variable with rate 1. 
a. If A. J. brings his car in at time 0 and M. J. brings her car in at 
time t, what is the probability that M. J.’s car is ready before 
A. J.’s car? (Assume that service times are independent and 
service begins upon arrival of the car.) 
b. If both cars are brought in at time 0, with work starting on M. 
J.’s car only when A. J.’s car has been completely serviced, 
what is the probability that M. J.’s car is ready before time 2? 
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6.29. The gross daily sales at a certain restaurant are a normal 
random variable with mean $2200 and standard deviation $230. 
What is the probability that 
a. the total gross sales over the next 2 days exceeds $5000; 
b. daily sales exceed $2000 in at least 2 of the next 3 days? 
What independence assumptions have you made? 


6.30. Jill’s bowling scores are approximately normally distributed with 
mean 170 and standard deviation 20, while Jack’s scores are 
approximately normally distributed with mean 160 and standard 
deviation 15. If Jack and Jill each bowl one game, then assuming 
that their scores are independent random variables, approximate the 
probability that 

a. Jack’s score is higher; 

b. the total of their scores is above 350. 


6.31. According to the U.S. National Center for Health Statistics, 25.2 
percent of males and 23.6 percent of females never eat breakfast. 
Suppose that random samples of 200 men and 200 women are 
chosen. Approximate the probability that 
a. at least 110 of these 400 people never eat breakfast; 
b. the number of the women who never eat breakfast is at least 
as large as the number of the men who never eat breakfast. 


6.32. Monthly sales are independent normal random variables with 
mean 100 and standard deviation 5. 
a. Find the probability that exactly 3 of the next 6 months have 
sales greater than 100. 
b. Find the probability that the total of the sales in the next 4 
months is greater than 420. 


6.33. Let X¥, and X, be independent normal random variables, each 
having mean 10 and variance o?. Which probability is larger 
a.P(X,>15) or P(X,+X,z>25); 
b. P(X, >15) or P(X,4+X, > 30). 


6.34. Suppose X and Y are independent normal random variables 
with mean 10 and variance 4. Find x such that 

P&+Y>x)= P(X > 15). 

6.35. Teams 1, 2,3, 4 are all scheduled to play each of the other 
teams 10 times. Whenever team i plays team j, team i is the winner 
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where 
Py. = .6, P13 =.7, Pig =.75 


Poy = A, P23 = 6, Poy = .70 


with probability P; ;, 


a. Approximate the probability that team 1 wins at least 20 
games. 
Suppose we want to approximate the probability that team 2 
wins at least as many games as does team 1. To do so, let X 
be the number of games that team 2 wins against team 1, let 
Y be the total number of games that team 2 wins against 
teams 3 and 4, and let Z be the total number of games that 
team 1 wins against teams 3 and 4. 

b. Are X,Y, Z independent. 

c. Express the event that team 2 wins at least as many games 
as does team 1 in terms of the random variables X,Y, Z. 

d. Approximate the probability that team 2 wins at least as many 
games as team 1. 


Hint: Approximate the distribution of any binomial random variable 
by a normal with the same mean and variance. 
6.36. Let Xj, ...,X49 be independent with the same continuous 
distribution function F, and let m be the median of that distribution. 
That is, F(m) =.5. 
a. If N is the number of the values X,,..., X19 that are less than 
m, what type of random variable is N. 
b. Let Xq) < X(2) < +++ < Xo) be the values X,,..., X49 arranged 
in increasing order. That is, X(j) is, fori = 1, ..., 10, the i*” 
smallest of X1,...,X19. Find P(X2) <m< X@)). 


6.37. The expected number of typographical errors on a page of a 
certain magazine is .2. What is the probability that an article of 10 
pages contains (a) 0 and (b) 2 or more typographical errors? Explain 
your reasoning! 
6.38. The monthly worldwide average number of airplane crashes of 
commercial airlines is 2.2. What is the probability that there will be 

a. more than 2 such accidents in the next month? 

b. more than 4 such accidents in the next 2 months? 

c. more than 5 such accidents in the next 3 months? 


Explain your reasoning! 
6.39. In Problem 6.4 __, calculate the conditional probability mass 


function of X, given that 
a. Xz, =1; 
b. X2 _ 0 . 


6.40. In Problem 6.3 __, calculate the conditional probability mass 
function of Y, given that 


a. Y,=1; 
b. Y, =0. 


6.41. The discrete integer valued random variables X,Y,Z are 
independent if for all i, 7, k 
PX S=tY e722 =PC =P = pPe=k) 


Show that if X,Y, Z are independent then X and Y are independent. 
That is, show that the preceding implies that 
PAH= LY =) =P H1PY =7) 


6.42. Choose a number X at random from the set of numbers 
{1,2,3,4,5}. Now choose a number at random from the subset no 
larger than X, that is, from {1, ...,X}. Call this second number Y. 
a. Find the joint mass function of X and Y. 
b. Find the conditional mass function of X given that Y = i. Do it 
fori = 1,2,3,4,5. 
c. Are X and Y independent? Why? 


6.43. Two dice are rolled. Let X and Y denote, respectively, the 
largest and smallest values obtained. Compute the conditional mass 
function of Y given X = i, fori = 1,2,...,6. Are X and Y independent? 


Why? 
6.44. The joint probability mass function of X and Y is given by 
1 1 
2,1) = : 2,2) = Z 


a. Compute the conditional mass function of X given 
Y=i1,i=1,2. 

b. Are X and Y independent? 

c. Compute P{XY < 3}, P{X + Y > 2}, P{X/Y > 1}. 


6.45. The joint density function of X and Y is given by 
f(xy) =xe%0tD x>0, y>0 
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a. Find the conditional density of X, given Y = y, and that of Y, 
given X =x. 
b. Find the density function of Z = XY. 


6.46. The joint density of X and Y is 
f(%y) = c(x?-y*)e"* OSx< «0, -—x<y<x 


Find the conditional distribution of Y, given X = x. 

6.47. An insurance company supposes that each person has an 
accident parameter and that the yearly number of accidents of 
someone whose accident parameter is 2 is Poisson distributed with 
mean 2. They also suppose that the parameter value of a newly 
insured person can be assumed to be the value of a gamma random 
variable with parameters s and a. If a newly insured person has n 
accidents in her first year, find the conditional density of her accident 
parameter. Also, determine the expected number of accidents that 
she will have in the following year. 

6.48. If X,,X2,X3 are independent random variables that are 
uniformly distributed over (0, 1), compute the probability that the 
largest of the three is greater than the sum of the other two. 

6.49. A complex machine is able to operate effectively as long as at 
least 3 of its 5 motors are functioning. If each motor independently 
functions for a random amount of time with density function 

f(x) = xe-*,x > 0, compute the density function of the length of time 
that the machine functions. 

6.50. If 3 trucks break down at points randomly distributed on a road 
of length L, find the probability that no 2 of the trucks are within a 
distance d of each other when d < L/2. 

6.51. Consider a sample of size 5 from a uniform distribution over (0, 


13 
1). Compute the probability that the median is in the interval (;: :): 


6.52. If X,,X2,X3,X4,Xs5 are independent and identically distributed 
exponential random variables with the parameter 2, compute 

a. P{min(X,,...,X5) < a}; 

b. P{max(X,,...,Xs) <a}. 


6.53. Let X(4),X(2) »--» Xin) be the order statistics of a set of n 
independent uniform (0,1) random variables. Find the conditional 
distribution of X(,) given that X (4) = $1,X(2) = $2, ..Xm-1) = Sn-1- 
6.54. Let Z, and Z, be independent standard normal random 


variables. Show that X,Y has a bivariate normal distribution when 
X=2,,Y=2, +Z>. 
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6.55. Derive the distribution of the range of a sample of size 2 from a 
distribution having density function f(x) = 2x,0<x<1. 

6.56. Let X and Y denote the coordinates of a point uniformly chosen 
in the circle of radius 1 centered at the origin. That is, their joint 
density is 


1 
{eya— rye 


Find the joint density function of the polar coordinates 

R = (X7+Y7)*/? and © =tan71¥/X. 

6.57. If X and Y are independent random variables both uniformly 
distributed over (0, 1), find the joint density function of 
R=X*+Y*,@=tan!¥/X. 

6.58. If U is uniform on (0, 27) and Z, independent of U, is 
exponential with rate 1, show directly (without using the results of 
Example 7b _) that X and Y defined by 


X = v2Z cos U 
Y = V2Z sinU 


are independent standard normal random variables. 
6.59. X and Y have joint density function 


il 
fay) = 


xy? 


x2ly21 


a. Compute the joint density function of U = XY,V = X/Y. 
b. What are the marginal densities? 


6.60. If X and Y are independent and identically distributed uniform 
random variables on (0, 1), compute the joint density of 
aU=X+Y,V=X/Y; 
b.U=X,V=X/Y; 
cU=X+YV=X/(X+Y). 


6.61. Repeat Problem 6.60 when X and Y are independent 
exponential random variables, each with parameter A = 1. 

6.62. If X; and X, are independent exponential random variables, 
each having parameter A, find the joint density function of 

¥, =X, +X, and Y,. =e", 

6.63. If X, Y, and Z are independent random variables having 
identical density functions f(x) = e-*,0 <x < o, derive the joint 
distribution of U=X+Y,V=X+Z,W=Y+ Z. 
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k 

6.64. In Example 8b, let Y,., =n+1— = Y;. Show that 

i=1 
Y1,+-.Ve, ¥x+4 are exchangeable. Note that Y;,4, is the number of 
balls one must observe to obtain a special ball if one considers the 
balls in their reverse order of withdrawal. 
6.65. Consider an urn containing n balls numbered 1,...,n, and 
suppose that k of them are randomly withdrawn. Let X; equal 1 if ball 
number i is removed and let X; be 0 otherwise. Show that X,,...,Xy, 
are exchangeable. 


Theoretical exercises 


6.1. Suppose X, Y have a joint distribution function F(x, y). Show 
how to obtain the distribution functions Fy(x) = P{X < x} and 
Fy(y) = P{Y < y}. 
6.2. Suppose that X and Y are integer valued random variables and 
have a joint distribution function F(i, 7) = P(X <i,Y < j). 
a. Give an expression, in terms of the joint distribution function, 
for P(X =i,Y <j). 
b. Give an expression, in terms of the joint distribution function, 
for P(X =i,Y =f). 


6.3. Suggest a procedure for using Buffon’s needle problem to 
estimate 7. Surprisingly enough, this was once a common method of 
evaluating 7. 

6.4. Solve Buffon’s needle problem when L > D. 


2L 
ANSWER: pel — sin 8) + 20/m, where cos9 = D/L. 


6.5. If X and Y are independent continuous positive random 
variables, express the density function of (a) Z = X/Y and (b) Z = XY 
in terms of the density functions of X and Y. Evaluate the density 
functions in the special case where X and Y are both exponential 
random variables. 

6.6. If X and Y are jointly continuous with joint density function 

fyy™ y), show that X + Y is continuous with density function 


fysy® = | fyyGt ~ x) dx 


6.7. 


a. If X has a gamma distribution with parameters (t, 2), what is 
the distribution of cX,c > 0? 
b. Show that 
5y Xan 
has a gamma distribution with parameters n, 2 when n is a 
positive integer and v7,,, is a chi-squared random variable 
with 2n degrees of freedom. 


6.8. Let X and Y be independent continuous random variables with 
respective hazard rate functions Ay(t) and Ay(t), and set 
W=min(X,Y). 
a. Determine the distribution function of W in terms of those of X 
andy. 
b. Show that A(t), the hazard rate function of W, is given by 
Aw (t) = Ax (t) + Ay (O) 


6.9. Let X,,...,X,, be independent exponential random variables 
having a common parameter 2. Determine the distribution of 
min(X,,...,Xp)- 
6.10. The lifetimes of batteries are independent exponential random 
variables, each having parameter 1. A flashlight needs 2 batteries to 
work. If one has a flashlight and a stockpile of n batteries, what is the 
distribution of time that the flashlight can operate? 
6.11. Let X,, Xz, X3, X4, X5 be independent continuous random 
variables having a common distribution function F and density 
function f, and set 

IT = P{X, < Xz <X3 << X4< Xs} 


a. Show that J does not depend on F’.. 
Hint: Write J as a five-dimensional integral and make the 
change of variables u; = F(x;),i = 1,...,5. 

b. Evaluate /. 

c. Give an intuitive explanation for your answer to (b). 


6.12. Show that the jointly continuous (discrete) random variables 
X4,..,Xp are independent if and only if their joint probability density 
(mass) function f(x, ...,x,) can be written as 


Fenatre | ae 


i=1 
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for nonnegative functions 9, (x),i = 1,...,n. 
6.13. In Example 5e =, we computed the conditional density of a 
success probability for a sequence of trials when the first n + m trials 
resulted in n successes. Would the conditional density change if we 
specified which n of these trials resulted in successes? 
6.14. Suppose that X and Y are independent geometric random 
variables with the same parameter p. 

a. Without any computations, what do you think is the value of 

P{X =i|X+Y =n}? 


Hint: Imagine that you continually flip a coin having probability 
p of coming up heads. If the second head occurs on the nth 
flip, what is the probability mass function of the time of the first 
head? 

b. Verify your conjecture in part (a). 


6.15. Consider a sequence of independent trials, with each trial being 
a success with probability p. Given that the kth success occurs on 
trial n, show that all possible outcomes of the first n — 1 trials that 
consist of k — 1 successes and n — k failures are equally likely. 
6.16. If X and Y are independent binomial random variables with 
identical parameters n and p, show analytically that the conditional 
distribution of X given that X + Y = m is the hypergeometric 
distribution. Also, give a second argument that yields the same result 
without any computations. 
Hint: Suppose that 2n coins are flipped. Let X¥ denote the number of 
heads in the first n flips and Y the number in the second n flips. 
Argue that given a total of m heads, the number of heads in the first 
n flips has the same distribution as the number of white balls 
selected when a sample of size m is chosen from n white and n 
black balls. 
6.17. Suppose that X;,i = 1,2,3 are independent Poisson random 
variables with respective means /;,i = 1,2,3. Let X = X, + X2 and 
Y = X, + X3. The random vector X,Y is said to have a bivariate 
Poisson distribution. Find its joint probability mass function. That is, 
find P{X =n, Y = m}. 
6.18. Suppose X and Y are both integer-valued random variables. 
Let 

pil) = P&=ilY=/) 


and 
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qU|) = PY = j|X =i) 


Show that 


ner 
GG 


6.19. Let X,,X ,X3 be independent and identically distributed 
continuous random variables. Compute 

a. P{X, > X2|X, > X3}; 

b. P{X, > X_|X1, < Xs}; 

c. P{X, > X2|X2 >X3}; 

d. P{X, > X2|Xz < X3}. 


6.20. Let U denote a random variable uniformly distributed over (0, 


1). Compute the conditional distribution of U given that 
a.U>a; 
b.U<@ 


where 0<a< 1. 

6.21. Suppose that W, the amount of moisture in the air on a given 
day, is a gamma random variable with parameters (t, 8). That is, its 
density is f(w) = Be ®”(Bw)' */T(t), w > 0. Suppose also that 
given that W = w, the number of accidents during that day—call it N 
—has a Poisson distribution with mean w. Show that the conditional 
distribution of W given that N = nis the gamma distribution with 
parameters (t+n,6 +1). 

6.22. Let W be a gamma random variable with parameters (t, 6), and 
suppose that conditional on W = w, X,,X2,...,X, are independent 
exponential random variables with rate w. Show that the conditional 
distribution of W given that X, = x1,X2 = X2,...,Xy = X, is gamma 


n 
with parameters | t+ 7,8 + > Xj 
i=1 


6.23. A rectangular array of mn numbers arranged in n rows, each 
consisting of m columns, is said to contain a saddlepoint if there is a 
number that is both the minimum of its row and the maximum of its 
column. For instance, in the array 


1 32 
0 -—2 6 
ov EZ: 3 


the number 1 in the first row, first column is a saddlepoint. The 
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existence of a saddlepoint is of significance in the theory of games. 
Consider a rectangular array of numbers as described previously and 
suppose that there are two individuals—A and B—who are playing 
the following game: A is to choose one of the numbers 1, 2, ...,n and 
B one of the numbers 1, 2,...,m. These choices are announced 
simultaneously, and if A chose i and B chose j, then A wins from B 
the amount specified by the number in the ith row, jth column of the 
array. Now suppose that the array contains a saddlepoint—say the 
number in the row r and column k call this number x,, . Now if player 
A chooses row 7, then that player can guarantee herself a win of at 
least x, (Since x,, is the minimum number in the row r). On the 
other hand, if player B chooses column k, then he can guarantee 
that he will lose no more than x,, (since x,, is the maximum number 
in the column k). Hence, as A has a way of playing that guarantees 
her a win of x,, and as B has a way of playing that guarantees he 
will lose no more than x,, it seems reasonable to take these two 
strategies as being optimal and declare that the value of the game to 
player A is x,,. 
If the nm numbers in the rectangular array described are 
independently chosen from an arbitrary continuous distribution, what 
is the probability that the resulting array will contain a saddlepoint? 
6.24. If X is exponential with rate A, find P{[X] = n,X — [X] < x}, 
where [x] is defined as the largest integer less than or equal to x. 
Can you conclude that [X] and X — [X] are independent? 
6.25. Suppose that F(x) is a cumulative distribution function. Show 
that (a) F"(x) and (b) 1 — [1 — F(x)]" are also cumulative distribution 
functions when n is a positive integer. 
Hint: Let X,,...,X,, be independent random variables having the 
common distribution function F. Define random variables Y and Z in 
terms of the X; so that P{Y < x} = F"(x) and 
P{Z <x}=1-[1-F(x)|". 
6.26. Show that if n people are distributed at random along a road L 
miles long, then the probability that no 2 people are less than a 
distance D miles apart is when D < L/(n—1),[1— (n—1)D/L]". 
What if D > L/(n—1)? 
6.27. Suppose that X,,...,X, are independent exponential random 
variables with rate A. Find 

a. fs |x tot Xn (x| t), the conditional density of X, given that 

Xi t.. +X, = 6; 
b. P(X, <x|X, +... +X, =0). 
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6.28. Establish Equation (6.2) by differentiating Equation 
(6.4) 
6.29. Show that the median of a sample of size 2n + 1 from a 
uniform distribution on (0, 1) has a beta distribution with parameters 
(n+1,n+1). 
6.30. Suppose that X,,...,X, are independent and identically 
distributed continuous random variables. For 
A={X1 <0 < Xj; >Xj41 > > Xn}, find P(A). That is, find the 
probability that the function X(i) = X;, i= 1,...,n, iS a unimodal 
function with maximal value X(j). Hint: Write 

A = {max(Xj,...,X;) = max(X1,..., Xn), 


Xp <0 < Xj, Xen > > Xn} 


6.31. Compute the density of the range of a sample of size n from a 
continuous distribution having density function f. 
6.32. Let Xq) S X(2) S++ S X(py be the ordered values of n 


independent uniform (0, 1) random variables. Prove that for 
1sken+1, 
P{X ae) — X¢k-1) > t} = (1 aa ae 


where Xo) = 0,X~m+1) = 1, and0O<t<1. 
6.33. Let X,, ...,X, be a set of independent and identically distributed 
continuous random variables having distribution function F, and let 
X (j,i = 1,...,n denote their ordered values. If X, independent of the 
X;,i = 1,...,n, also has distribution F, determine 

a. P{X > Xm}; 

b. P{X > Xi}; 

C. P{Xyy <X<X@p}isi<j<n. 


6.34. Let X,, ...,X, be independent and identically distributed random 
variables having distribution function F and density f. The quantity 
M= [Xay + X ny] /2; defined to be the average of the smallest and 
largest values in X,,...,X,, is called the midrange of the sequence. 
Show that its distribution function is 


m 
Fy(m) =n | [F(2m — x) — F(x)|""4f (x) dx 
6.35. Let Xj, ...,X,, be independent uniform (0, 1) random variables. 


Let R = X) — Xa) denote the range and M = [X() + Xq|/2 the 
midrange of X,,...,X,,. Compute the joint density function of R and M 
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6.36. If X and Y are independent standard normal random variables, 
determine the joint density function of 


U=X a 
7 a 


Then use your result to show that X/Y has a Cauchy distribution. 
6.37. Suppose that (X, Y) has a bivariate normal distribution with 
parameters i, Hy» Fx Fy, P 


xX — L 
a. Show that (—, a) has a bivariate normal distribution 


with parameters 0,1,0,1,p. 
b. What is the joint distribution of (aX + b,cY + d). 


6.38. Suppose that X has a beta distribution with parameters (a, b), 
and that the conditional distribution of N given that X = x is binomial 
with parameters (n + m,x). Show that the conditional density of X 
given that N = n is the density of a beta random variable with 
parameters (n+ a,m+b). N Is said to be a beta binomial random 
variable. 


6.39. Consider an experiment with n possible outcomes, having 
n 


respective probabilities P,,...,P,, P; = 1, and suppose we 
pd 
want to assume a probability distribution on the probability vector 
n 


(P,,..,P,). Because >, P; = 1, we cannot define a density on 

4 

P,,...,P,, but what we can do is to define one on P,,...,P,—, and 
n-1 

then take P, = 1— > P;. The Dirichlet distribution takes 


i=1 
(P,,...,Py—1) to be uniformly distributed over the set 


n= 1 
S={(p, ye p, <1, p,; >0,i=1,..,n—1}. Thatis, the 
(24 


Dirichlet density is 


fy, 


n—ti1 
p> 0t= Lm, Pp, <1 
i=1 


a. Determine C. Hint: Use results from Section 6.3.1 


wepPpaa Par Pa) = G 


Let U,,...,U, be independent uniform (0, 1) random variables. 
b. Show that the Dirichlet density is the conditional density of 


n=1 
U,,...,Uy,—41 given that > U,<1. 
(1 


C. ‘ Show that Uy, U2) = Ui, a Un) = Um -1) has a Dirichlet 


distribution, where Uy, ++» Uy are the order statistics of 
Vigcoy ll ys 


6.40. Let Fy, x, (%1, +, %_) and Fa sXn (X41, ...,Xy,) be, respectively, 
the joint distribution function and the joint density function of 
oa 
Show that 
gr 
ae ey Pry Xn Cp Xn) = fy x Cp Xn) 
6.41. For given constants c; > 0, let Y; = c;X;, i= 1,...,n, and let 
Fy. Yn (X10) Xn) and fy Yn (X1,+..,X,) be, respectively the joint 
distribution function and the joint density function of Y,,..., Yn 
a. Express Fy, y,,(%1, +.» Xn) in terms of the joint distribution 
function of Xj, ...,Xn 
b. Express fy Yn (X1,.-,Xp) in terms of the joint density 
function of Xj, ...,Xy.- 
c. Use Equation (7.3) _ to verify your answer to part (b). 


Self-Test Problems and Exercises 


6.1. Each throw of an unfair die lands on each of the odd numbers 1, 3, 
5 with probability C and on each of the even numbers with probability 2 
C. 

a. Find C. 

b. Suppose that the die is tossed. Let X equal 1 if the result is an 
even number, and let it be 0 otherwise. Also, let Y equal 1 if the 
result is a number greater than three and let it be O otherwise. 
Find the joint probability mass function of X and Y. Suppose 
now that 12 independent tosses of the die are made. 

c. Find the probability that each of the six outcomes occurs exactly 
twice. 

d. Find the probability that 4 of the outcomes are either one or two, 
4 are either three or four, and 4 are either five or six. 

e. Find the probability that at least 8 of the tosses land on even 
numbers. 


6.2. The joint probability mass function of the random variables x, Y, Z 
is 


1 
p(1, 2,3) = p(2,1,1) = p(2, 2,1) = p(2,3,2) = A 
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Find (a) E|XYZ], and (b) E[XY + XZ+ YZ]. 
6.3. The joint density of X and Y is given by 
f(xy)=Ciy-x)e” -y<x<y, 0<y< o 


.Findc. 

. Find the density function of xX. 
. Find the density function of Y. 
. Find E[X]. 

. Find E[Y]. 


o0oao»#& 


6.4. Letr=r,+... +1r,, where all r; are positive integers. Argue that if 
X4,+.,X, has a multinomial distribution, then so does Y,, ...,Y, where, 
with ro = 0, 
Ti-a tr; 
Y,;= bs X 


j=rj-yt1 


That is, Y, is the sum of the first r, of the X’s, Y, is the sum of the next 
Tz, and so on. 

6.5. Suppose that X, Y, and Z are independent random variables that 
are each equally likely to be either 1 or 2. Find the probability mass 
function of (a) XYZ, (b) XY + XZ+YZ, and (c) X*+YZ. 

6.6. Let X and Y be continuous random variables with joint density 
function 


x 
=+cy 0<x<11<y<5 
fay)= 795 


0 otherwise 


where c is a constant. 
a. What is the value of c? 
b. Are X and Y independent? 
c. Find P{X + Y > 3}. 


6.7. The joint density function of X and Y is 


xy O0<x<1,0<y<2 
f@yY = 
0 otherwise 


. Are X and Y independent? 

. Find the density function of xX. 

. Find the density function of Y. 

. Find the joint distribution function. 
. Find E[Y]. 


o0oanqondoa#e ® 
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f. Find P{xX+Y <1}. 


6.8. Consider two components and three types of shocks. A type 1 
shock causes component 1 to fail, a type 2 shock causes component 2 
to fail, and a type 3 shock causes both components 1 and 2 to fail. The 
times until shocks 1, 2, and 3 occur are independent exponential 
random variables with respective rates 1,,A2, and A; . Let X; denote 
the time at which component i fails, i = 1,2. The random variables 
X,,Xz are said to have a joint bivariate exponential distribution. Find 
PiXG > 8, Xe > th. 
6.9. Consider a directory of classified advertisements that consists of m 
pages, where 7 is very large. Suppose that the number of 
advertisements per page varies and that your only method of finding 
out how many advertisements there are on a specified page is to count 
them. In addition, suppose that there are too many pages for it to be 
feasible to make a complete count of the total number of 
advertisements and that your objective is to choose a directory 
advertisement in such a way that each of them has an equal chance of 
being selected. 
a. If you randomly choose a page and then randomly choose an 

advertisement from that page, would that satisfy your objective? 

Why or why not? 

Let n(i) denote the number of advertisements on page 

i,i = 1,...,m, and suppose that whereas these quantities are 

unknown, we can assume that they are all less than or equal to 

some specified value n. Consider the following algorithm for 

choosing an advertisement. 

Step 1. Choose a page at random. Suppose it is page xX. 
Determine n(X) by counting the number of 
advertisements on page X. 


Step 2. “Accept” page X with probability n(X)/n. If page X is 
accepted, go to step 3. Otherwise, return to step 1. 

Step 3. Randomly choose one of the advertisements on 
page XxX. 


Call each pass of the algorithm through step 1 an iteration. For 
instance, if the first randomly chosen page is rejected and the 
second accepted, then we would have needed 2 iterations of the 
algorithm to obtain an advertisement. 

b. What is the probability that a single iteration of the algorithm 
results in the acceptance of an advertisement on page i? 


c. What is the probability that a single iteration of the algorithm 
results in the acceptance of an advertisement? 

d. What is the probability that the algorithm goes through k 
iterations, accepting the jth advertisement on page i on the final 
iteration? 

e. What is the probability that the jth advertisement on page i is 
the advertisement obtained from the algorithm? 


> 


. What is the expected number of iterations taken by the 
algorithm? 


6.10. The “random” parts of the algorithm in Self-Test Problem 6.9 
can be written in terms of the generated values of a sequence of 
independent uniform (0, 1) random variables, known as random 
numbers. With [x] defined as the largest integer less than or equal to x, 
the first step can be written as follows: 
Step 1. Generate a uniform (0, 1) random variable U. Let 
X = [mU] +1, and determine the value of n(X). 


a. Explain why the above is equivalent to step 1 of Problem 6.8 
Hint: What is the probability mass function of X? 
b. Write the remaining steps of the algorithm in a similar style. 


6.11. Let X,, X2,... be a sequence of independent uniform (0, 1) random 
variables. For a fixed constant c, define the random variable N by 
N = min{n:X,, > c} 


Is N independent of X,,? That is, does knowing the value of the first 
random variable that is greater than c affect the probability distribution 
of when this random variable occurs? Give an intuitive explanation for 
your answer. 

6.12. The accompanying dartboard is a square whose sides are of 
length 6: 
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The three circles are all centered at the center of the board and are of 
radii 1, 2, and 3, respectively. Darts landing within the circle of radius 1 
score 30 points, those landing outside this circle, but within the circle of 
radius 2, are worth 20 points, and those landing outside the circle of 
radius 2, but within the circle of radius 3, are worth 10 points. Darts that 
do not land within the circle of radius 3 do not score any points. 
Assuming that each dart that you throw will, independently of what 
occurred on your previous throws, land on a point uniformly distributed 
in the square, find the probabilities of the accompanying events: 

a. You score 20 on a throw of the dart. 

b. You score at least 20 on a throw of the dart. 

c. You score 0 on a throw of the dart. 

d. The expected value of your score on a throw of the dart. 

e. Both of your first two throws score at least 10. 

f. Your total score after two throws is 30. 


6.13. A model proposed for NBA basketball supposes that when two 
teams with roughly the same record play each other, the number of 
points scored in a quarter by the home team minus the number scored 
by the visiting team is approximately a normal random variable with 
mean 1.5 and variance 6. In addition, the model supposes that the 
point differentials for the four quarters are independent. Assume that 
this model is correct. 
a. What is the probability that the home team wins? 
b. What is the conditional probability that the home team wins, 
given that it is behind by 5 points at halftime? 
c. What is the conditional probability that the home team wins, 
given that it is ahead by 5 points at the end of the first quarter? 
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6.14. Let N be a geometric random variable with parameter p. 
Suppose that the conditional distribution of X given that N = n is the 
gamma distribution with parameters n and 2. Find the conditional 
probability mass function of N given that X = x. 
6.15. Let X and Y be independent uniform (0, 1) random variables. 
a. Find the joint density of U=xX,V=X+Y. 
b. Use the result obtained in part (a) to compute the density 
function of V. 


6.16. You and three other people are to place bids for an object, with 
the high bid winning. If you win, you plan to sell the object immediately 
for $10,000. How much should you bid to maximize your expected 
profit if you believe that the bids of the others can be regarded as being 
independent and uniformly distributed between $7,000 and $10,000 
thousand dollars? 
6.17. Find the probability that X,, X>,...,X,, is a permutation of 1, 2,...,n, 
when X,,X2,...,X, are independent and 

a. each is equally likely to be any of the values 1,..., 7; 

b. each has the probability mass function 

P{X, = p= Ppt = ie. 


6.18. Let X,,...,X, and Y,,...,¥, be independent random vectors, with 
each vector being a random ordering of k ones and n — k zeros. That 
is, their joint probability mass functions are 

PIXGH teas kpeoig =P Stic ¥o =e} 


any 


n 
() 


Let 


n 


N= > [%-¥il 


i=1 


denote the number of coordinates at which the two vectors have 
different values. Also, let M denote the number of values of i for which 
X,;=1,Y;=0. 

a. Relate N to M. 

b. What is the distribution of M? 

c. Find E[N]. 

d. Find Var(N). 
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*6.19. Let Z1,Z2,...,Z, be independent standard normal random 


j 
i=1 


a. What is the conditional distribution of S,, given that S, = y. Find 
itfork =1,...,n—1. 
b. Show that, for 1 < k < n, the conditional distribution of 5; given 


variables, and let 


that S,, = x is normal with mean xk/n and variance k(n — k)/n. 


6.20. Let X,, X>,... be a sequence of independent and identically 
distributed continuous random variables. Find 

a: P{X, > X, |X) = max(%,...,X%.)} 

b: P{X, > X,|X4 = maxX 4,22,X-)} 


6.21. Prove the identity 
P{X<s,¥<t}=P{X <s}+P{Y <t}+P{xX>s,Y>t}-1 


Hint: Derive an expression for P{X > s,Y > t} by taking the probability 
of the complementary event. 

6.22. In Example 1c __, find P(X, =i, Y, = j) when j <i. 

6.23. A Pareto random variable X with parameters a >0, >0has 
distribution function F(x) = 1—a’4x~4,x>a.x>a. For x, >a, verify 
that the conditional distribution of X given that X > x, is that of a Pareto 
random variable with parameters (x, A) by evaluating 

PLS x |X > xX): 


6.24. Verify the identity f, (x) =| fyyy lw fy ody. 


6.25. In a contest originating with n players, each player independently 
advances to the next round, with player i advancing with probability p, . 
If no players advance to the next round, then the contest ends and all 
the players in the just concluded round are declared co-winners. If only 
one player advances, then that player is declared the winner and the 
contest ends. If two or more players advance, then those players play 
another round. Let X; denote the number of rounds that i plays. 

a. Find P(X; = k). Hint: Note that {X; = k} will occur if i advances 
at least k times and at least one of the other players advances at 
least k — 1 times. 

b. Find P(i is either the sole winner or one of the co-winners). 

Hint: It might help to imagine that a player always continues to 
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play rounds until he or she fails to advance. (That is, if there is a 
sole winner then imagine that that player continues on until a 
failure occurs.) 

c. Find P(/is the sole winner) 


6.26. Let X,,...,X,, be independent nonnegative integer valued random 
n 

variables, and let a; = P(X;is even ),i = 1,...,n. With S = > X; we 
f=4 


want to determine p = P(S iseven). Let Y; = 1 if X; is even and let it 

equal —1 if X; is odd. 

In parts (a) and (b) fill in the missing word at the end of the sentence. 
a. S is even if and only if the number of X,,...,X,, that are odd is 


n 
b. S is even if and only if | | Y; is 
i=4, 


n 
c. Find Al | vi 
i=1 


d. Find P(S is even ). Hint: Use parts (b) and (c). 


Chapter 7 Properties of Expectation 
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7.9 General Definition of Expectation 


7.1 Introduction 
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In this chapter, we develop and exploit additional properties of expected values. To 
begin, recall that the expected value of the random variable X is defined by 


E[X] = ) xp) 


x 


when X is a discrete random variable with probability mass function p(x), and by 
E[X] - | x f (X) dx 


when X is a continuous random variable with probability density function f(x). 


Since E[X] is a weighted average of the possible values of X, it follows that if X must 
lie between a and b, then so must its expected value. That is, if 


Pla<X<b}=1 


then 
a<E|xX|<b 
To verify the preceding statement, suppose that X is a discrete random variable for 


which P{a < X < b} = 1. Since this implies that p(x) = 0 for all x outside of the 
interval [a, b], it follows that 


x p(X) 
x:p(xX)> 0 


E[X] 


IV 


ap(x) 
x:p(X)> 0 


a dpe) 
x:p(xX)> 0 


= a 


In the same manner, it can be shown that E[X] < b, so the result follows for discrete 
random variables. As the proof in the continuous case is similar, the result follows. 


7.2 Expectation of Sums of Random 
Variables 
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For a two-dimensional analog of Propositions 4.1 of Chapter4 and2_ .1of 
Chapter 5 __, which give the computational formulas for the expected value of a 
function of a random variable, suppose that X and Y are random variables and g isa 
function of two variables. Then we have the following result. 


Proposition 2.1 


If X and Y have a joint probability mass function p(x,y), then 


E|g(%,Y)] = > > 9%” p(x,y) 
y x 
If X and Y have a joint probability density function f(x,y), then 
E[g(X,Y)] = | | g(% y) f(x y) dx dy 


Let us give a proof of Proposition 2.1 | when the random variables X and Y are 
jointly continuous with joint density function f(x,y) and when g(X,Y) isa 
nonnegative random variable. Because g(X,Y) = 0, we have, by Lemma 2.1 of 
Chapter 5 __, that 


E[g(X,¥)] = { P{g(X,Y) > thdt 
0 


Writing 


roar >e|{ f(x,y) dy dx 
(wy):g(%y) >t 


E[g(X,Y)] - | {| f(% y) dy dx dt 
0 (%y):g(%y) >t 


Interchanging the order of integration gives 


shows that 
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E|g(X,Y)] 


g(x) 
{[] f(x,y) dt dy dx 
x¥y~t=0 
| [ oc» f(x,y) dy dx 
xvy 


Thus, the result is proven when g(X,Y) is a nonnegative random variable. The 


general case then follows as in the one-dimensional case. (See Theoretical 
Exercises 2 and 3 of Chapter 5 _ .) 


Example 2a 


An accident occurs at a point X that is uniformly distributed on a road of length L. 
At the time of the accident, an ambulance is at a location Y that is also uniformly 
distributed on the road. Assuming that X and Y are independent, find the 
expected distance between the ambulance and the point of the accident. 
Solution 


We need to compute E[ |X — Y|]. Since the joint density function of X and Y is 


1 
PGI) = so O0<x<L, O0<y<L 


it follows from Proposition 2.1 that 


L pl 
1 
eux-vin= A [ m-yidras 
o Yo 


Now, 
L x iG 
[ x-via = [ @-nar+ | ony 
0 0 x 
x2 L2 2 
= atzy7 77%) 
2 
= —+4+x?-xL 
Therefore, 


E[|xX—Y|] 


lI 
rele 
7 lal 
a 
n| 
+ 
& 

N 
| 
& 
ct 
a 
Sy 
R 


For an important application of Proposition 2.1 _, suppose that E[X] and E[Y] are 
both finite and let g(X, Y) = X + Y. Then, in the continuous case, 


[ | ernronans 
[ | sxronarare | | y f(x,y) dx dy 


[ strevars | y fy Q) dy 


E[X] + E[Y] 


E[X +Y] 


The same result holds in general; thus, whenever E[X] and E[Y] are finite, 


(2.1) 
E[X + Y] = E[X] + E[Y] 
Example 2b 
Suppose that for random variables X and Y, 
X>Y 
That is, for any outcome of the probability experiment, the value of the random 
variable X is greater than or equal to the value of the random variable Y. Since 


X = Y is equivalent to the inequality X — Y = 0, it follows that E[X — Y| = 0, or, 
equivalently, 


E[X] = E[Y] 


Using Equation (2.1) |, we may show by a simple induction proof that if E[X;] is 
finite for all i = 1,...,n, then 


(2.2) 
E[X, +--+ +X,] = E[X,] +--+ E[X,] 


482 of 848 


483 of 848 


Equation (2.2) is an extremely useful formula whose utility will now be illustrated 
by a series of examples. 


Example 2c The sample mean 


Let Xj, ...,X,, be independent and identically distributed random variables having 
distribution function F and expected value uw. Such a sequence of random 
variables is said to constitute a sample from the distribution F. The quantity 


n 
r= yo 
7 n 
i=1 
is called the sample mean. Compute E[X|. 
Solution 
1X 
FE] = e}) = 


= uw since E[X;| =u 


That is, the expected value of the sample mean is y, the mean of the distribution. 
When the distribution mean p is unknown, the sample mean is often used in 
statistics to estimate it. 


Example 2d Boole’s inequality 


Let A,,...,.A, denote events, and define the indicator variables X;,i = 1, ...,.n, by 


(3 if A; occurs 
‘lo otherwise 


Let 


so X denotes the number of the events A; that occur. Finally, let 


o 1 if X>1 
~— (0 otherwise 


so Y is equal to 1 if at least one of the A; occurs and is 0 otherwise. Now, it is 
immediate that 


X>Y 
so 
E[X] = E[Y] 
But since 
EIX]= > EIXi= >. P(Ad 
i=1 i=1 
and 


E{Y| = P{atleast one of the A; occur} = e( 


we obtain Boole’s inequality, namely, 


n n 
Pi U A,J|< P(A;) 
fee 


The next three examples show how Equation (2.2) | can be used to calculate the 
expected value of binomial, negative binomial, and hypergeometric random 
variables. These derivations should be compared with those presented in Chapter 
4 


Example 2e Expectation of a binomial random variable 


Let X be a binomial random variable with parameters n and p. Recalling that 
such a random variable represents the number of successes in n independent 
trials when each trial has probability p of being a success, we have that 


X=X, +X, 4 4+Xy 


where 
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oe 1 if the ith trial is asuccess 
‘lo if the ith trial isa failure 


Hence, X; is a Bernoulli random variable having expectation 
E[X;| = 1(p) + 0(1 — p). Thus, 


E[X] = E[X,] + E[X2] ++ + [Xn] = np 


Example 2f Mean of a negative binomial random variable 


If independent trials having a constant probability p of being successes are 
performed, determine the expected number of trials required to amass a total of r 
successes. 


Solution 


lf X denotes the number of trials needed to amass a total of r successes, then X 
is a negative binomial random variable that can be represented by 


X=X,+X,+--+X, 
where X, is the number of trials required to obtain the first success, X, the 
number of additional trials until the second success is obtained, X3 the number of 
additional trials until the third success is obtained, and so on. That is, X; 
represents the number of additional trials required after the (i — 1) success until a 
total of i successes is amassed. A little thought reveals that each of the random 


variables X; is a geometric random variable with parameter p. Hence, from the 
results of Example 8b of Chapter4 _ , E[X;] = 1/p,i = 1,2....,7r; thus, 


B[X] = BLXy] + + BLK] = 5 


Example 2g Mean of a hypergeometric random variable 


If n balls are randomly selected from an urn containing N balls of which m are 
white, find the expected number of white balls selected. 


Solution 


Let X denote the number of white balls selected, and represent X as 


X=Xyt-+Xm 


where 


nas 1 ifthe ith white ball is selected 
‘(0 otherwise 


Now 
E[X;] = P{X; = 1} 
= P{ith white ball is selected} 
1\/N-1 
7 1/\n-1 
7 N 
n 
th 
ON 
Hence, 
mn 
E[X] = E[X,] +++ + E[X»] = WV 


We could also have obtained the preceding result by using the alternative 
representation 


X=Y, +--+ 7, 
where 


{0 if the ith ball selected is white 


: 0 otherwise 


Since the i th ball selected is equally likely to be any of the N balls, it follows that 


so 


nm 
E[X] = B[Y4] ++ B[q] = 5 


Example 2h Expected number of matches 


Suppose that N people throw their hats into the center of a room. The hats are 
mixed up, and each person randomly selects one. Find the expected number of 
people who select their own hat. 
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Solution 


Letting X denote the number of matches, we can compute E|X] most easily by 
writing 


X=X,+X24+°:4+Xy 
where 


i 1 ifthe ith person selects his own hat 
‘(0 otherwise 


Since, for each i, the i th person is equally likely to select any of the N hats, 


Zl 


E[X)] = P{X, = = 
Thus, 


E[X] = E[X,] +--+ E[Xy] = fa = 


Hence, on the average, exactly one person selects his own hat. 


Example 2i Coupon-collecting problems 


Suppose that there are N types of coupons, and each time one obtains a coupon, 
it is equally likely to be any one of the N types. Find the expected number of 
coupons one needs to amass before obtaining a complete set of at least one of 
each type. 


Solution 


Let X denote the number of coupons collected before a complete set is attained. 
We compute E[X]| by using the same technique we used in computing the mean 
of a negative binomial random variable (Example 2f _ ). That is, we define 

X;,i = 0,1,...,N — 1 to be the number of additional coupons that need be obtained 
after i distinct types have been collected in order to obtain another distinct type, 
and we note that 


X=Xo +Xy ++ Xy_y 


When i distinct types of coupons have already been collected, a new coupon 
obtained will be of a distinct type with probability (N — i)/N. Therefore, 


N=ifive 
P(X; = k}=— (=) k>1 


or, in other words, X; is a geometric random variable with parameter (N — i)/N. 


Hence, 
E|X;| = = 
[Xi] = Hy 
implying that 
E[X] =1+ Ze LER 
aos a 1 
=N\1+ : 
z *N-1 WN 
Example 2j 


Ten hunters are waiting for ducks to fly by. When a flock of ducks flies overhead, 
the hunters fire at the same time, but each chooses his target at random, 
independently of the others. If each hunter independently hits his target with 
probability p, compute the expected number of ducks that escape unhurt when a 
flock of size 10 flies overhead. 


Solution 


Let X; equal 1 if the ith duck escapes unhurt and 0 otherwise, for i = 1, 2, ..., 10. 
The expected number of ducks to escape can be expressed as 


E[X, +++ X19] = E[X1] ++ + E[X10] 


To compute E£|X;| = P{X; = 1}, we note that each of the hunters will, 
independently, hit the ith duck with probability p/10, so 


Poa (1 7 Zz). 


Hence, 


Example 2k Expected number of runs 
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Suppose that a sequence of n 1’s and m 0’s is randomly permuted so that each 
of the (n+ m)!/(n!m!) possible arrangements is equally likely. Any consecutive 
string of 1’s is said to constitute a run of 1’s for instance, ifn = 6,m = 4, and the 
ordering is 1, 1, 1, 0, 1, 1, 0, O, 1, 0, then there are 3 runs of 1’s and we are 
interested in computing the mean number of such runs. To compute this quantity, 
let 


i 


1 ifarunof 1'sstarts atthe ith position 
0 otherwise 


Therefore, R(1), the number of runs of 1, can be expressed as 


n+m 


R(1) = » i 
i=1 


and it follows that 


Now, 


ty 
orem | 

— 
PR 
i 

H 


P{“1” in position 1} 


n 


n+m 


and for1 <i<n+m, 


Ell;| = P{“0"inposition i— 1, “1” in position i} 
_ om n 
— ntmn+m-i1 
Hence, 
E[R(1)| = +(n+ 1 fot 
RCO = aan eat Garman) 


Similarly, E[R(0)], the expected number of runs of 0’s, is 


nm 
E[R(0)] = 


n+m n+m 


and the expected number of runs of either type is 
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2nm 


E[R(1) + RO)] = 1+ —— 


Example 21 A random walk in the plane 


Consider a particle initially located at a given point in the plane, and suppose that 
it undergoes a sequence of steps of fixed length, but in a completely random 
direction. Specifically, suppose that the new position after each step is one unit of 
distance from the previous position and at an angle of orientation from the 
previous position that is uniformly distributed over (0, 27). (See Figure 7.1 _ .) 
Compute the expected square of the distance from the origin after n steps. 


Figure 7.1 


(0) = initial position 
C1) = position after first step 


(2) = position after second step 


Solution 


Letting (X;, Y;) denote the change in position at the ith step, i = 1,...,n, in 
rectangular coordinates, we have 


X; = cosd0; 


Y; = sin 6; 


where 6,,i = 1,....n, are, by assumption, independent uniform (0, 277) random 
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variables. Because the position after n steps has rectangular coordinates 


n n 
» Xi, » Y; |, it follows that D’, the square of the distance from the origin, 
fSa Sa 


is given by 
2 2 
n n 
De = >. x + xy, 
i=1 i=1 
n 
i=1 ix j 


n+ >. >. 60s 6; cos 6; + sin 6; sin 6 ;) 


itz j 


where cos? 6; + sin? 6; = 1. Taking expectations and using the independence of 
6; and 6; when i + j and the fact that 


20 

2 E[cos6@;| = | cos u du = sin2a — sin0 = 0 
0 
20 

2n E|sin@;| = | sin udu = cos 0 — cos 2m = 0 
0 


we arrive at 


E[D?| = 7 


Example 2m Analyzing the quick-sort algorithm 


Suppose that we are presented with a set of n distinct values x1, x2, ....%, and 
that we desire to put them in increasing order, or as it is commonly stated, to sort 
them. An efficient procedure for accomplishing this task is the quick-sort 
algorithm, which is defined as follows: When n = 2, the algorithm compares the 
two values and then puts them in the appropriate order. When n > 2, one of the 
elements is randomly chosen Say it is x; and then all of the other values are 
compared with x;. Those smaller than x; are put in a bracket to the left of x; and 
those larger than x; are put in a bracket to the right of x;. The algorithm then 
repeats itself on these brackets and continues until all values have been sorted. 
For instance, suppose that we desire to sort the following 10 distinct values: 


5,9, 3,10, 11, 14, 8, 4,17, 6 
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We start by choosing one of them at random (that is, each value has probability 


1 
io of being chosen). Suppose, for instance, that the value 10 is chosen. We then 


compare each of the others to this value, putting in a bracket to the left of 10 all 
those values smaller than 10 and to the right all those larger. This gives 


{5,9,3,8,4,6},10,{11,14,17} 


We now focus on a bracketed set that contains more than a single value say—the 
one on the left of the preceding—and randomly choose one of its values—say that 
6 is chosen. Comparing each of the values in the bracket with 6 and putting the 
smaller ones in a new bracket to the left of 6 and the larger ones in a bracket to 
the right of 6 gives 


{5,3,4},6,{9,8},10,{11,14,17} 


If we now consider the leftmost bracket, and randomly choose the value 4 for 
comparison, then the next iteration yields 


{3},4,{5},6,{9,8},10,{11,14,17} 


This continues until there is no bracketed set that contains more than a single 
value. 


If we let X denote the number of comparisons that it takes the quick-sort 
algorithm to sort n distinct numbers, then E[X] is a measure of the effectiveness 
of this algorithm. To compute E[X], we will first express X as a sum of other 
random variables as follows. To begin, give the following names to the values 
that are to be sorted: Let 1 stand for the smallest, let 2 stand for the next 
smallest, and so on. Then, for 1 <i< j <n, let (i,j) equal 1 if i and j are ever 
directly compared, and let it equal 0 otherwise. With this definition, it follows that 


= : (i,j) 


i=1j=it1 


implying that 
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n—i1 n 


E[X] = E Bs By Ii, j) 


i=1j=iti 
n—i1 n 

- >) > arGal 
i=1j=iti 


n 
> P{i and j are ever compared} 
1j=iti 


IP 


i 


To determine the probability that i and j are ever compared, note that the values 
i,i+1,...,j7 — 1,7 will initially be in the same bracket (since all values are initially 
in the same bracket) and will remain in the same bracket if the number chosen 
for the first comparison is not between i and j. For instance, if the comparison 
number is larger than j, then all the values i,i+ 1,..., 7 — 1, j will go in a bracket to 
the left of the comparison number, and if it is smaller than i, then they will all go 
in a bracket to the right. Thus all the values i,i + 1,..., 7 — 1,7 will remain in the 
same bracket until the first time that one of them is chosen as a comparison 
value. At that point all the other values between i and j will be compared with this 
comparison value. Now, if this comparison value is neither i nor j, then upon 
comparison with it, i will go into a left bracket and j into a right bracket, and thus i 
and j will be in different brackets and so will never be compared. On the other 
hand, if the comparison value of the set i,i + 1,..., 7 —1,j is either i or j, then 
there will be a direct comparison between i and j. Now, given that the 
comparison value is one of the values between i and j, it follows that it is equally 
likely to be any of these j —i+ 1 values, and thus the probability that it is either i 
or j is 2/(j —i+1). Therefore, we can conclude that 


2 
PSi dj d:= 
{i and j are ever compared} ara 
and 
n—ti1 n 
Le ar 
7 yore 
i=1j=iti 


To obtain a rough approximation of the magnitude of E|X] when n is large, we 
can approximate the sums by integrals. Now 


n 


n 
_ 2 2 d 
jaa ere ae 
iti 
= 2log(x—i+1)|"is1 
= 2log(n—i+1)—2log(2) 
= 2log(n—i+1) 


Thus 
n—ti1 
E[X|] ~ > 2log(n —i+1) 
=a 


n-1 
= | log(n —x +1) dx 
1 


n 
= 2 | log(y)dy 
2 


= 2(ylog(y) — y)|> 
= 2nlog(n) 


Thus we see that when n is large, the quick-sort algorithm requires, on average, 
approximately 2n log(n) comparisons to sort n distinct values. 


Example 2n The probability of a union of events 


Let A,,...4, denote events, and define the indicator variables X;,i = 1, ....n, by 


i 


1 if A; occurs 
~ (0 otherwise 


Now, note that 


1 if UA; occurs 


1- a-xp={ 


0 otherwise 


Hence, 


n 
alee | | (14—Xx,) =P( U A:) 
i=1 
i=1 
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Expanding the left side of the preceding formula yields 


(2.3) 
n 
n 
pu Ai) = E >, XDD K+) YY KX /Xe 
it i=1 p24 i<j<k 
ee te 1)" 7X 4X] 
However, 
1 if Aj, Aj,--Ai, occurs 
Xi, Xi, Xin a ; 
0 otherwise 
sO 


E[Xi, Xi, | = P(Ai, “Ai, ) 


Thus, Equation (2.3) is just a statement of the well-known inclusion-exclusion 
formula for the union of events: 


P(UA;) = 4, PAd - Ly P(AiAj)+ MUD P(AAjJAx) 
k 


i<j ae ee 


P(Ay"'An) 


n+1 


eee 2 (-1) 


When one is dealing with an infinite collection of random variables X;,i => 1, each 
having a finite expectation, it is not necessarily true that 


(2.4) 


oa n 
To determine when (2.4 _ ) is valid, we note that ». X,= lim ». X;. Thus, 
n (ee) 
f=4 [=a 
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(2.5) 


by 
I IX 
& 
I 
by 
3 
= 
3 
I M2 
& 


I 
| 
JE 
32 
by 
I M2 
& 


Hence, Equation (2.4) _ is valid whenever we are justified in interchanging the 
expectation and limit operations in Equation (2.5) —. Although, in general, this 
interchange is not justified, it can be shown to be valid in two important special 
cases: 


1. The X; are all nonnegative random variables. (That is, P{X; => 0} = 1 for all i.) 


2.) ELK] < &. 
i=1 


Example 20 


Consider any nonnegative, integer-valued random variable X. If, for each i > 1, 
we define 


then 


Hence, since the X; are all nonnegative, we obtain 
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(2.6) 


= 
OS 
I 

II as: 
by 
pul 
x 


ll 
M 
x 
>< 
V 


a useful identity. 


Example 2p 


Suppose that n elements—call them 1, 2, ..., n—must be stored in a computer in 
the form of an ordered list. Each unit of time, a request will be made for one of 
these elements —i being requested, independently of the past, with probability 


PO).12 1. > PO = 1. Assuming that these probabilities are known, what 


ordering minimizes the average position in the line of the element requested? 


Solution 


Suppose that the elements are numbered so that P(1) => P(2) => --- => P(n). To 
show that 1, 2, ..., n is the optimal ordering, let X denote the position of the 
requested element. Now, under any ordering say, O = ij, iz, ..,in, 


Po{X = k} 


IV ll 
IMs iM. 
aS ~~ 
> Pa 
= w 


Il 

~y 
m= 
N 


pay cee ley 


Summing over k and using Equation (2.6) yields 
Eo[X] 2 Ey2,...,n/X] 
thus showing that ordering the elements in decreasing order of the probability 


that they are requested minimizes the expected position of the element 
requested. 


*7.2.1 Obtaining bounds from expectations via the 
probabilistic method 
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The probabilistic method is a technique for analyzing the properties of the elements 
of a set by introducing probabilities on the set and then studying an element chosen 
according to those probabilities. The technique was previously seen in Example 41 of 
Chapter 3 ___, where it was used to show that a set contained an element that 
satisfied a certain property. In this subsection, we show how it can sometimes be 
used to bound complicated functions. 


Let f be a function on the elements of a finite set A, and suppose that we are 
interested in 


A useful lower bound for m can often be obtained by letting S be a random element 
of A for which the expected value of f(S) is computable and then noting that 
m = f(S) implies that 


m = Elf(s)] 


with strict inequality if f(S) is not a constant random variable. That is, E[f(S)] is a 
lower bound on the maximum value. 


Example 2q The maximum number of hamiltonian paths in a tournament 

A round-robin tournament of n > 2 contestants is a tournament in which each of 
n 

the e pairs of contestants play each other exactly once. Suppose that the 


players are numbered 1, 2, 3,....n. The permutation i,, iz,...i, is said to bea 
Hamiltonian path if i, beats i,,i, beats iz,..., and i,_, beats i,,. A problem of 
some interest is to determine the largest possible number of Hamiltonian paths. 


As an illustration, suppose that there are 3 players. On the one hand, one of 
them wins twice, then there is a single Hamiltonian path. (For instance, if 1 wins 
twice and 2 beats 3, then the only Hamiltonian path is 1, 2, 3.) On the other hand, 
if each of the players wins once, then there are 3 Hamiltonian paths. (For 
instance, if 1 beats 2, 2 beats 3, and 3 beats 1, then 1, 2, 3; 2, 3, 1; and 3, 1, 2, 
are all Hamiltonians.) Hence, when n = 3, there is a maximum of 3 Hamiltonian 
paths. 


We now show that there is an outcome of the tournament that results in more 
than n!/2" * Hamiltonian paths. To begin, let the outcome of the tournament 


specify the result of each of the (") games played, and let A denote the set of all 
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n 
a(:) possible tournament outcomes. Then, with f(s) defined as the number of 
Hamiltonian paths that result when the outcome is s € A, we are asked to show 
that 


n! 
max f(s) Ee pn=a 
To show this, consider the randomly chosen outcome S that is obtained when the 


n 
results of the « games are independent, with each contestant being equally 


likely to win each encounter. To determine E|f(S)|, the expected number of 


Hamiltonian paths that result from the outcome S, number the 
n! 
permutations, and, for i = 1,...,n!, let 


i 1 if permutation i is a Hamiltonian 
‘(0 otherwise 


Since 
f(S)=) x 
i 
it follows that 


ELF) = ) EX] 


Because, by the assumed independence of the outcomes of the games, the 
probability that any specified permutation is a Hamiltonian is (ij2). it follows 
that 


E[X;] = P{X; = 1} = 1/2)" * 
Therefore, 
E[f(s)] =n1(/2)" * 


Since f(S) is not a constant random variable, the preceding equation implies that 
there is an outcome of the tournament having more than n!/2" * Hamiltonian 
paths. 


Example 2r 


A grove of 52 trees is arranged in a circular fashion. If 15 chipmunks live in these 
trees, show that there is a group of 7 consecutive trees that together house at 
least 3 chipmunks. 


Solution 


Let the neighborhood of a tree consist of that tree along with the next six trees 
visited by moving in the clockwise direction. We want to show that for any choice 
of living accommodations of the 15 chipmunks, there is a tree that has at least 3 
chipmunks living in its neighborhood. To show this, choose a tree at random and 
let X denote the number of chipmunks that live in its neighborhood. To determine 
E{X], arbitrarily number the 15 chipmunks and for i = 1,..., 15, let 


ii 1, if chipmunk i lives in the neighborhood of the randomly chosen tree 
a 0, otherwise 


Because 


we obtain that 


However, because X; will equal 1 if the randomly chosen tree is any of the 7 
trees consisting of the tree in which chipmunk i lives along with its 6 neighboring 
trees when moving in the counterclockwise direction, 


7 
E[X;] = PiXi = Y= 


Consequently, 


105 
E[xX| = 52. > 2 


showing that there exists a tree with more than 2 chipmunks living in its 
neighborhood. 


‘7.2.2 The maximum—minimums identity 
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We start with an identity relating the maximum of a set of numbers to the minimums 
of the subsets of these numbers. 


Proposition 2.2 


For arbitrary numbers x;,i = 1,...,n, 


maxx; = > — min(x;,x;) + > min(X;, Xj, Xx) 
U 


i i<j i<j<k 
+ et (= 1)" min, 5 Xp) 
Proof We will give a probabilistic proof of the proposition. To begin, assume that 
all the x; are in the interval [0, 1]. Let U be a uniform (0, 1) random variable, and 
define the events A;,i = 1,...,n, by A; = {U < x;}. That is, A; is the event that the 
uniform random variable is less than x;. Because at least one of these events A; 


will occur if U is less than at least one of the values x;, we have that 


Uj; Aj = ju < max x;} 
i 


Therefore, 
P(.U; Aj) = piu < max x;} = max x; 
U U 
Also, 
P(A;) = P{U < x;} = x; 


In addition, because all of the events 4,,,...,A;, will occur if U is less than all the 


iy? 
values x;,, ...,X;,, we see that the intersection of these events is 


Ag uA; = iu =, min i} 
implying that 


P(Aj,..Ai,) =P;U< min x,,¢= min xj, 
j ae ae 


Sad ges 


Thus, the proposition follows from the inclusion—exclusion formula for the 
probability of the union of events: 
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POU, 4: = ee 2 P(A;A;) + ». P(A:AjAx) 


i<j i<j<k 


bat (—1)"* P(A, An) 


When the x; are nonnegative, but not restricted to the unit interval, let c be such 
that all the x; are less than c. Then the identity holds for the values y, = x;/c, and 
the desired result follows by multiplying through by c. When the x; can be 
negative, let b be such that x; + b > 0 for all i. Therefore, by the preceding, 


max(x; + b) a (x; +b) — ». min(x; + b,x; + b) 


i<j 
ter t(- 1)"*tmin(x, + D, ..4. Xn + dD) 


Letting 


M = dF ». min(x;,x;) + + (— 1)" **min(%4, ..., Xp) 


i<j 


we can rewrite the foregoing identity as 


(Greco) 
maxx; +b=M+bin— spraetre€ = 1) 
i Zz n 


But 


o=(-1=1-n4 (2) ¢~4(-"(") 
2 n 


The preceding two equations show that 


max x; = M 
i 


and the proposition is proven. 


It follows from Proposition 2.2 — that for any random variables X,, ..., Xp, 


max X; oe - ». min(X;,X;) ++ + (—1)""* min(X4, ..., Xp) 


i<j 


Taking expectations of both sides of this equality yields the following relationship 
between the expected value of the maximum and those of the partial minimums: 


(2.7) 


E[max X;] = 2,04 -)> E[min(X;, X;)] 


i<j 


E[min(X,, .... Xn)| 


nt+1 


a, 


Example 2s Coupon collecting with unequal probabilities 


Suppose there are n types of coupons and that each time one collects a coupon, 
it is, independently of previous coupons collected, a type i coupon with 


n 
probability p,, p, = 1. Find the expected number of coupons one needs to 
—s 
collect to obtain a complete set of at least one of each type. 


Solution 


If we let X; denote the number of coupons one needs to collect to obtain a type i, 
then we can express X as 


X= max 4X; 
i=1,..,7 


Because each new coupon obtained is a type i with probability p,, X; is a 
geometric random variable with parameter p,. Also, because the minimum of X; 
and X; is the number of coupons needed to obtain either a type i or a type j, it 
follows that for i # j, min (X;, X;) is a geometric random variable with parameter 
PD, + D;- Similarly, min (X;,X;,X;,), the number needed to obtain any of types i, j, 
or k, is a geometric random variable with parameter p, + Di + Py and so on. 
Therefore, the identity (2.7 __) yields 


a ee ee 
HB, £4 PtP; PtP t+ Px 


i<j 


Adele ae ( _ al 
P, -b eee - vy, 


_ 1 
| e P*dx =— 
p 

0 


Noting that 


and using the identity 
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n 
1— | | (1 —e7Pi*) = Dae = > e Pit Pix pe — 1)"*4¢-@it 1+ +D_)x 


i=1 i i<j 
shows, upon integrating the identity, that 
2 n 
E[X] -| ie | | (1 — e-Pi*) | dx 
0 i=1 


which is a more useful computational form. 


7.3 Moments of the Number of Events that 
Occur 


Many of the examples solved in the previous section were of the following form: For 
given events Aj,...,A,, find E[X], where X is the number of these events that occur. 
The solution then involved defining an indicator variable J; for event A; such that 


1, if A; occurs 
I; = 


0, otherwise 


Because 


we obtained the result 


(3.1) 


Now suppose we are interested in the number of pairs of events that occur. Because 
I]; will equal 1 if both A; and A; occur and will equal 0 otherwise, it follows that the 


number of pairs is equal to » I,];. But because X is the number of events that 
i<j 


X 
occur, it also follows that the number of pairs of events that occur is (7) 


Consequently, 


n 
where there are (7) terms in the summation. Taking expectations yields 


(3.2) 
X 
‘(O)- 2 = raw 
i<j i<j 
or 
X(X—1)} | 
EH} —|= D, PAAD 
i< j 
giving that 
(3.3) 


E[X?] — E[X] =2 ». P(A;A;) 


i<j 
which yields E[X?], and thus Var (X) = E[x?| — (E[X])°. 


Moreover, by considering the number of distinct subsets of k events that all occur, we 
see that 


ie ips os ei 
Taking expectations gives the identity 


(3.4) 


E (1) = ». Ellizlig'*lig] = >, P(Ai, Ai, *Aiy) 


ie tg Sm eT pt ig me Sy 


Example 3a Moments of binomial random variables 


Consider n independent trials, with each trial being a success with probability p . 
Let A; be the event that trial jis a success. When i4j, P(A;A;) = p?. 
Consequently, Equation (3.2 _) yields 
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A()|= > t= (Sp 


i<j 
or 


E[X(X —1)] = n(n — 1)p? 
or 
E[X?] — E[X] = n(n —1)p? 
Now, E[X] = PAD = np, so, from the preceding equation 
Var (X) = E[X?] — (E[X])” = n(n — 1)p? + np — (np)” = np(1-p) 


which is in agreement with the result obtained in Section 4.6.1 


In general, because P(Aj, Ain “Ai, ) = p*, we obtain from Equation (3.4) __ that 


iy < ing <u < iz 
or, equivalently, 
E[X(X — 1)--(X¥ —k +1)| =n(n-1)--(n—k +1)p* 


The successive values E[X*], k > 3, can be recursively obtained from this 
identity. For instance, with k = 3, it yields 


E[X(X — 1)(X — 2)] = n(n-1)(n—- 2)p? 
or 
E[X? — 3X? + 2X] =n(n—-1)(n— 2)p? 


or 


E[X*] = 3E[X*] — 2E[X] + n(n- 1)(n- 2)p3 
= 3n(n—1)p?+np+n(n-1)(n- 2)p3 


Example 3b Moments of hypergeometric random variables 
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Suppose n balls are randomly selected from an urn containing N balls, of which 
m are white. Let A; be the event that the ith ball selected is white. Then X, the 
number of white balls selected, is equal to the number of the events Ay, ...,A,, that 
occur. Because the ith ball selected is equally likely to be any of the N balls, of 
which m are white, P(A;) = m/N. Consequently, Equation (3.1) — gives that 


n 
E[X] = > P(A,) = nm/N. Also, since 
i=1 


mm-—t1 
P(A;A;) = P(A;)P(A;| Ai) = WNo1L 


we obtain, from Equation (3.2) __, that 
z X\| m(m—1) _ (n\m(m-1) 
ra) N(N-1) \2/ N(N-1) 
i<j 


or 


m(m — 1) 


showing that 


This formula yields the variance of the hypergeometric, namely, 


Var (X) 


E[X?] — (E[X])” 


m(m—1) nm nm? 


= mn— Dama + WWE 
_ mn[(n-1)(m-1) mn 
~ N| N-1 °) WN 


which agrees with the result obtained in Example 8j of Chapter 4 
Higher moments of X are obtained by using Equation (3.4) |. Because 


_ an = 1) Gk 1) 


P(A; Az, Ay) = NN =1)--(N =e 41) 


Equation (3.4) yields 
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X\|__ (n\m(m- 1) (m—k +1) 
‘\(7) 7 (7) N(N —1)-(N —k +1) 
or 


E[X(X —1)-(X —k + 1] 


m(m—1):-\(m—k +1) 


=n(n—-1):-(n—k +1) N(W—1)(N—k +1) 


Example 3c Moments in the match problem 


For i = 1,...,N, let A; be the event that person i selects his or her own hat in the 
match problem. Then 


1 1 
P(A;A;) = P(ADP(AY IAD = 5 


which follows because, conditional on person i selecting her own hat, the hat 
selected by person j is equally likely to be any of the other N — 1 hats, of which 
one is his own. Consequently, with X equal to the number of people who select 
their own hat, it follows from Equation (3.2) that 


Xx 1 N 1 
. (3) - Pest -() sara 
i< j 


thus showing that 


E[X(X-1)]=1 
N 
Therefore, E|X?| = 1 + E[X]. Because E[X] = >. P(A;) = 1, we obtain that 
i=1 
Var (X) = E[X?] — (E[X)* = 1. 


Hence, both the mean and variance of the number of matches is 1. For higher 


moments, we use Equation (3.4) __, along with the fact that 


1 
Ay Ap A. => $$ —_—\$§#{—+\{ , t tai 
P(Ap AA; ) NW—1-(N-k+1) o obtain 


or 
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E[X(X —1)--(X —k+1)] =1 


Example 3d Another coupon-collecting problem 


Suppose that there are N distinct types of coupons and that, independently of 
past types collected, each new one obtained is type j with probability Pj» 


N 
> p, = 1. Find the expected value and variance of the number of different 
j=1 


types of coupons that appear among the first n collected. 


Solution 


We will find it more convenient to work with the number of uncollected types. So, 
let Y equal the number of types of coupons collected, and let ¥ = N — Y denote 
the number of uncollected types. With A; defined as the event that there are no 
type i coupons in the collection, X is equal to the number of the events Ay, ..., Ay 
that occur. Because the types of the successive coupons collected are 
independent, and, with probability 1 — p, each new coupon is not type i, we have 


P(A) = (1—p,)" 


N 
Hence, E[X| = (1 -— D,)" from which it follows that 
) sus 


L 


N 
E[Y] = N—E[X]=N- > (1-p,)” 


i=1 


Similarly, because each of the n coupons collected is neither a type i nor a type j 
coupon, with probability 1 — p, — pj, we have 


P(A|A;) = A —-p,- pd”) i+ j 
Thus, 


EIM(X-1]=2 ) Pa) =2 > A-p,-p)" 


ae | Lau 
or 


E[X*]=2 ) (-p,-p,)" + EL 


ey | 


Hence, we obtain 
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Var (Y) Var (X) 


E[X?] — (E[X])* 


2 


2) -p,-p)"+ (1-p,)"- 5 (1-p)" 
(=4 = 


i<j 


In the special case where p, = 1/N, i = 1,...,N, the preceding formulas give 
U 


and 
2 n 1 n : 1 2n 
var (Y) = ww ~ (1-5) +n(2- 5] -w(a - 5] 


Example 3e The negative hypergeometric random variables 


Suppose an urn contains n + m balls, of which n are special and m are ordinary. 
These items are removed one at a time, with each new removal being equally 
likely to be any of the balls that remain in the urn. The random variable Y, equal 
to the number of balls that need be withdrawn until a total of r special balls have 
been removed, is said to have a negative hypergeometric distribution. The 
negative hypergeometric distribution bears the same relationship to the 
hypergeometric distribution as the negative binomial does to the binomial. That 
is, in both cases, rather than considering a random variable equal to the number 
of successes in a fixed number of trials (as are the binomial and hypergeometric 
variables), they refer to the number of trials needed to obtain a fixed number of 
successes. 


To obtain the probability mass function of a negative hypergeometric random 
variable Y, note that Y will equal k if both 


1. the first k — 1 withdrawals consist of r — 1 special and k — r ordinary balls 
and 
2. the kth ball withdrawn is special. 


Consequently, 
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We will not, however, utilize the preceding probability mass function to obtain the 
mean and variance of Y. Rather, let us number the m ordinary balls as 0, ..., Om, 
and then, for each i = 1,...,n, let A; be the event that o; is withdrawn before r 
special balls have been removed. Then, if X is the number of the events Aj,..., Am 
that occur, it follows that X is the number of ordinary balls that are withdrawn 
before a total of r special balls have been removed. Consequently, 


Y=r+xX 


showing that 
m 
E[Y]=r+£[X]=r+ oy P(A;) 
i=1 


To determine P(A;), consider the n + 1 balls consisting of 0; along with the n 
special balls. Of these n + 1 balls, 0; is equally likely to be the first one 
withdrawn, or the second one withdrawn, ..., or the final one withdrawn. Hence, 
the probability that it is among the first 7 of these to be selected (and so is 


r 
removed before a total or r special balls have been withdrawn) is are 
Consequently, 

P(A) = 
(Ai) n+1 
and 


fr. nem A) 
nt+1- n+1 


E|[Y]jJ=r+m 


Thus, for instance, the expected number of cards of a well-shuffled deck that 

39 
would need to be turned over until a spade appears is 1 + a 3.786, and the 
expected number of cards that would need to be turned over until an ace 


; 48 
appears is 1 + — 10.6. 


To determine Var(Y) = Var(X) , we use the identity 
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E[X(X —1)] =2 = P(A;A;) 


oa 


Now, P(A;A;) is the probability that both 0; and o; are removed before there have 
been a total of r special balls removed. So consider the n + 2 balls consisting of 
0;,0;, and the n special balls. Because all withdrawal orderings of these balls are 
equally likely, the probability that 0; and 0; are both among the first r + 1 of them 
to be removed (and so are both removed before r special balls have been 


rns) aie 


withdrawn) is 


PAA) = aD) GE DetD 
é + 4 
Consequently, 
er ee 23-0 r(r+1) 

AGES DT (") (n + 1)(n + 2) 
so 

> rr 1) 
Because E[X| =m nor this yields 

r(r+1) i fee 
Var(Y) = Var (X) = m(m = Le 1G 2) +m an | oa (m a 


A little algebra now shows that 


var(Y) = mr(n+1-—r)(n+m+1) 
7 (n+ 1)°(n+ 2) 


Example 3f Singletons in the coupon collector’s problem 


Suppose that there are n distinct types of coupons and that, independently of 
past types collected, each new one obtained is equally likely to be any of the n 
types. Suppose also that one continues to collect coupons until a complete set of 
at least one of each type has been obtained. Find the expected value and 
variance of the number of types for which exactly one coupon of that type is 
collected. 
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Solution 


Let X equal the number of types for which exactly one of that type is collected. 
Also, let T; denote the ith type of coupon to be collected, and let A; be the event 
that there is only a single type 7; coupon in the complete set. Because X is equal 
to the number of the events A,,...,A,, that occur, we have 


E[X]= > P(A) 
i=1 


Now, at the moment when the first type T; coupon is collected, there remain n — i 
types that need to be collected to have a complete set. Because, starting at this 
moment, each of these n — i+ 1 types (the n —/not yet collected and type T)) is 
equally likely to be the last of these types to be collected, it follows that the type 
T; will be the last of these types (and so will be a singleton) with probability 


———— 1 . . 
Tope Consequently, P(Ai) = ———— +; yielding 


To determine the variance of the number of singletons, let 5; is for i < j, be the 
event that the first type T; coupon to be collected is still the only one of its type to 
have been collected at the moment that the first type T; coupon has been 
collected. Then 


P(A;A;) = P(A;A;|Si;)P(Si;) 


Now, P(S;,;) is the probability that when a type 7; has just been collected, of the 
n —i-+1 types consisting of type T; and the n — i as yet uncollected types, a type 
T; is not among the first j — i of these types to be collected. Because type 7; is 
equally likely to be the first, or second, or ....n —i+ 1 of these types to be 
collected, we have 


j-i nt+1-j 
Ce re ae ee 

Now, conditional on the event S; ;, both A; and A; will occur if, at the time the first 
type T, coupon is collected, of the n — j + 2 types consisting of types 7;,7;, and 
the n — j as yet uncollected types, T; and T; are both collected after the other 

n — j. But this implies that 
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fren= 71 
Therefore, 
P(A,;A;) = a i<j 
“40> G@at—pm@ta—sy ' 
yielding 
E|X(X —1)| =4 a 
ea x @+i=)@+2—7) 


pe 
Consequently, using the previous result for E{X], we obtain 
n n a 
Var (X 4 ». + Pz : Ps : 
ae Ten Ter i i 
i< J — — 


7.4 Covariance, Variance of Sums, and 
Correlations 


The following proposition shows that the expectation of a product of independent 
random variables is equal to the product of their expectations. 


Proposition 4.1 


If X and Y are independent, then, for any functions h and g, 
Elg(X)h(Y)] = ElgOIJE[A(Y)] 


Proof Suppose that X and Y are jointly continuous with joint density f(x, y). Then 


Elg(x)h(Y)] | | g(x) h(y) f(xy) dx dy 


| | G(X) hY) fy) fy(y) dx dy 


| A(y) fyore| G(X) f(x) dx 


(oe) 


E|AY)JE[g(X)] 
The proof in the discrete case is similar. 


Just as the expected value and the variance of a single random variable give us 
information about that random variable, so does the covariance between two random 
variables give us information about the relationship between the random variables. 


Definition 
The covariance between X and Y, denoted by Cov (X,Y), is defined by 


Cov(X,¥) = E[(X — E[X])(¥ — E[Y])] 


Upon expanding the right side of the preceding definition, we see that 


Cov (X,Y) 


E[XY — E[X]Y — XE[Y] + E[Y]E[X]] 


| 
aS) es 
Saar 
2) oS 
|| 
mo 


Note that if X and Y are independent, then, by Proposition 4.1, Cov (X,Y) =0. 
However, the converse is not true. A simple example of two dependent random 
variables X and Y having zero covariance is obtained by letting X be a random 
variable such that 


P{X = 0} = P{X = 1} = P{X = -=5 


and defining 


y_(° if X +0 
(4 if X=0 


Now,XY = 0, so E[XY] = 0. Also,E[X]| = 0. Thus, 


Cov (X,Y) = E[XY] — E[X]E[Y] = 0 
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However,X and Y are clearly not independent. 
The following proposition lists some of the properties of covariance. 


Proposition 4.2 


i. Cov (X,Y) = Cov (Y,X) 
ii. Cov (X,X) = Cov(X) 
ili. Cov (aX, Y) =a Cov (X,Y) 


n m n m 
iv. Cov ». Xi, es y Ps Cov (Xi, Y;) 
i=1 J =1 — = 


Proof of Proposition 4.2 Parts (i) and (ii) follow immediately from the definition 
of covariance, and part (iii) is left as an exercise for the reader. To prove part (iv), 
which states that the covariance operation is additive (as is the operation of 
taking expectations), let u, = E[X;| and v; = E|Y;|. Then 


n n m m 
E >, x = >, He E > I= » 
i=1 i= =1 ji 


1 j 


and 


1 i 


ap) (X:—n,) - Dy (Yj) 

i=1 

n m 
= — >, » (Xi— u)(V;— %) 

45-4 
m 
». E[Xi— 4 )0;- | 
where the last equality follows because the expected value of a sum of random 


variables is equal to the sum of the expected values. 


It follows from parts (ii) and (iv) of Proposition 4.2 —__, upon taking 
Yj = XpJ = 1,..,n, that 


n 
= >» Cov (X;, X;) + ». Cov (X;, Xj) 


i=1 fii =i 
n 

- » Var (X;) +>») Cov (X;,X)) 
i=1 is j 


Since each pair of indices i, j,i # 7, appears twice in the double summation, the 
preceding formula is equivalent to 


(4.1) 


n n 
Var » X; |= > Var (X;) +2) > Cov(X;, X;) 
1 i=1 


i= i<j 


If X1,...,X, are pairwise independent, in that X; and X; are independent for i # j, 
then Equation (4.1) | reduces to 


n n 
Var >. X;J= >». Var (X;) 
1 i=1 


i= 


The following examples illustrate the use of Equation (4.1) 


Example 4a 


Let X4,...,.X, be independent and identically distributed random variables having 


n 
expected value yw and variance a”, and asin Example 2c __, let X = > X;/n 
i=1 
be the sample mean. The quantities X; — X,i = 1,...,n, are called deviations, as 
they equal the differences between the individual data and the sample mean. 
The random variable 


n 2 
g2 = ». (X,=X) 
= n—-1 
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is called the sample variance. Find (a) Var(X) and (b)E[S?]. 


Solution 
_ in 
Var (X) = = Var >. X; 
sl 
a. 1\V?>~< 
= Te > Var(X;) by independence 


b. We start with the following algebraic identity: 


(m-1)8? = DY &i-pte-%)’ 
i=1 
= > &-w*+ ) &-0'-28-) > H-o 
i=1 i=1 i=1 


i— w)? +n(X — pw)” — 2% — p)n(X - p) 


i] 
S 


i=1 


>, i= w? =n - wy)’ 


i=1 


Taking expectations of the preceding yields 


(n— 1)E[S?] E[(X; — w)7] — nE[(X — »)*] 


[4a 


no* —n Var (X) 
(n—1)o? 


where the final equality made use of part (a) of this example and the one 
preceding it made use of the result of Example 2c __, namely, that E[X| = wy. 
Dividing through by n — 1 shows that the expected value of the sample variance 
is the distribution variance o?. 


Our next example presents another method for obtaining the variance of a binomial 
random variable. 


Example 4b Variance of a binomial random variable 


Compute the variance of a binomial random variable X with parameters n and p. 


Solution 


Since such a random variable represents the number of successes in n 
independent trials when each trial has the common probability p of being a 
success, we may write 


X=X,t-4+Xy 


where the X; are independent Bernoulli random variables such that 


1 ifthe ithtrialisasuccess 
‘(0 otherwise 


Hence, from Equation (4.1) —_, we obtain 


Var (X) = Var (X,) +--+ Var (X,) 


But 


| 
a 
x 
.N 
| 
— 
a, 
on 
ey 
N 


Var (X;) 


= E[x,]—(E[X;])’ since x?, = X; 


Thus, 
Var (X) = np(1 — p) 


Example 4c Sampling from a finite population 


Consider a set of N people, each of whom has an opinion about a certain subject 
that is measured by a real number v that represents the person’s “strength of 
feeling” about the subject. Let v; represent the strength of feeling of person 
Li=1,...N. 


Suppose that the quantities v,,i = 1,...,N, are unknown and, to gather 
information, a group of n of the N people is “randomly chosen” in the sense that 


N 
all of the (7) subsets of size n are equally likely to be chosen. These n people 


are then questioned and their feelings determined. If S denotes the sum of the n 
sampled values, determine its mean and variance. 
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An important application of the preceding problem is to a forthcoming election in 
which each person in the population is either for or against a certain candidate or 


proposition. If we take v; to equal 1 if person i is in favor and 0 if he or she is 
N 


against, then v = > v;/N represents the proportion of the population that is in 
4 

favor. To estimate V, a random sample of n people is chosen, and these people 

are polled. The proportion of those polled who are in favor — that is, S/n — is often 

used as an estimate of v. 


Solution 


For each person i,i = 1,...,N, define an indicator variable J; to indicate whether or 
not that person is included in the sample. That is, 


i 


1 if person iisinthe random sample 
~ (0 otherwise 


Now,S can be expressed by 


sO 
N 
E[s]}= ) w(t] 
i=1 
N 
Var (S) = > Var (v1) +2 VY Cov (Yl; vj1;) 
4 ie 
N 
= ». v?; Var (J;)+2>¥ Viv; Cov (Ij, 1;) 
as i<j 
a 
Because 
n 
Ell] = N 
nn—-1 
Elll;| = ee 


it follows that 
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Var (1;) = ~(1-=) 


N N 
= 4 2 
cov ls) = Rema (W) 
_ =nW =n) 
~ N?(N—1) 


Hence, 


E[s| = yee 
. 2n(N ~n) 
= (at) 5 B= SY 


The expression for Var (S) can be simplified somewhat by using the identity 


Var (S) 


N 
(vy te + vy) = » vj +2>¥ v;v;. After some simplification, we obtain 
I bad 
N 
2, % 
N-n = 
Var (Ss) = a aa ee 


Consider now the special case in which Np of the v’s are equal to 1 and the 
remainder equal to 0. Then, in this case,S is a hypergeometric random variable 
and has mean and variance given, respectively, by 


a feo IND 
E[S] = nv = np since V = —- =p 


and 


Var (S) = 


The quantity S/n, equal to the proportion of those sampled that have values 
equal to 1, is such that 
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by 
cea | 
Sln 
| 
Il 
3 


y S N-n 1 

ar\o} = n(N — 1)? P) 
The correlation of two random variables X and Y, denoted by p(X,Y), is defined, 
as long as Var (X)Var (Y) is positive, by 

Cov (X,Y) 
P(XY) = 
Var (X) Var (Y) 

It can be shown that 


(4.2) 
—1<p(X,Y) <1 


To prove Equation (4.2) _, suppose that and Y have variances given by o7, and 
ge respectively. Then, on the one hand, 


Xx Y 
0 < Var|—+— 
Ox, Oy 
_ Var (X) " Var (Y) a. 2 Cov (X,Y) 
gt ae Ox0y 


= 2[1+ p(X,Y)] 


implying that 


=1 = p(%Y¥) 


On the other hand, 


Xx Y 
0 < Var{|—-— 
Ox Oy 


Var (X) 7 Var (Y) 2 Cov (X,Y) 
Ox (ay)? Oxy 


= 2[1+p(%,¥)] 


implying that 


p(X,Y) <1 


which completes the proof of Equation (4.2) 
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In fact, since Var (Z) = 0 implies that Z is constant with probability 1 (this intuitive 
relationship will be rigorously proven in Chapter 8 _ ), it follows from the proof of 
Equation (4.2) that p(x, Y) = 1 implies that Y = a + bx, where b = o,/o, > 0 
and p(X,Y) = — 1 implies that Y = a + bX, where b= — 0,/o, < 0. We leave it 
as an exercise for the reader to show that the reverse is also true: that if 

Y =a+ bX, then p(X,Y) is either +1 or —1, depending on the sign of b. 


The correlation coefficient is a measure of the degree of linearity between X and Y.A 
value of p(X, Y) near +1 or —1 indicates a high degree of linearity between X and Y, 
whereas a value near 0 indicates that such linearity is absent. A positive value of 
p(X,Y) indicates that Y tends to increase when X does, whereas a negative value 
indicates that Y tends to decrease when X increases. If (X,Y) = 0, then X and Y are 
said to be uncorrelated. 


Example 4d 


Let J, and I, be indicator variables for the events A and B. That is, 


{0 if A occurs 
IT, = 


0 otherwise 
—_ 1 if B occurs 
a 10 otherwise 
Then 
E{I,] = P(A) 
E|Ig| = P(B) 
E|I,I,] = P(AB) 
sO 


Cov (I4,Ig) = P(AB) — P(A)P(B) 
= P(B)[P(A|B) — P(A)] 


Thus, we obtain the quite intuitive result that the indicator variables for A and B 
are either positively correlated, uncorrelated, or negatively correlated, depending 
on whether P(A|B) is, respectively, greater than, equal to, or less than P(A). 


Our next example shows that the sample mean and a deviation from the sample 
mean are uncorrelated. 


Example 4e 


Let X4,...,.X, be independent and identically distributed random variables having 
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variance a2. Show that 


Cov (X; —X,X) =0 


Solution 


We have 


Cov (X; —X,X) = Cov (X;,X) — Cov (X,X) 


n 
d _ 
= Cov Su = X; | — Var (X) 


jul 
oe a8 

= ——-—-=-90 
n on 


where the next-to-last equality uses the result of Example 4a _and the final 
equality follows because 


0 if j #i byindependence 


o* if j =isince Var (X;) = 07 


Cov (Xj, Xj) = 
Although X and the deviation X; — X are uncorrelated, they are not, in general, 
independent. However, in the special case where the X; are normal random 
variables, it turns out that not only is X independent of a single deviation, but it is 
independent of the entire sequence of deviations X,; — X, j = 1,...,n. This result 
will be established in Section 7.8 —_, where we will also show that, in this case, 
the sample mean X and the sample variance S? are independent, with 
i= 1)S?/o? having a chi-squared distribution with n — 1 degrees of freedom. 
(See Example 4a _ for the definition of S”.) 


Example 4f 


Consider m independent trials, each of which results in any of r possible 
Tr 

outcomes with probabilities p,, ...,p,, p, = 1. If we let N;,i = 1,...,r, denote 
i=4 


the number of the m trials that result in outcome i, then N,, Nz, ..., N, have the 
multinomial distribution 


- 
m! 2 o 
P{N, = 14,.N, =n} = —_—_ p TD Tr ni =m 
Ny ludtiz! 
i=1 
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For i # j, it seems likely that when N; is large,N ; would tend to be small; hence, 
it is intuitive that they should be negatively correlated. Let us compute their 
covariance by using Proposition 4.2 (iv) and the representation 


i(k) and Nj; = pe 1,(k) 
k=1 


where 
1 iftrial k resultsin outcome i 
Ii(k) = 
0 otherwise 
1 iftrial k resultsin outcome j 
Ij(k) = 
0 otherwise 


From Proposition 4.2 (iv), we have 


m 


m 
Cov (N;,N;) = ». ». Cov (,(),1( 2) 
€ =1k=1 
Now, on the one hand, when k # @, 


Cov (1,(k),1;(¢ )) =0 


since the outcome of trial k is independent of the outcome of trial £. On the other 
hand, 


Cov(ii(# )U(4)) = BlaCe GC )] — ela 4 ELC? DI 


= O— Dip, a — P,P; 


where the equation uses the fact that /;( ¢ )1;( ¢ ) = 0, since trial £ cannot result 
in both outcome i and outcome j. Hence, we obtain 


Cov (Ni, Nj) = — MPP; 


which is in accord with our intuition that VN; and N; are negatively correlated. 


7.5 Conditional Expectation 


7.5.1 Definitions 


Recall that if X and Y are jointly discrete random variables, then the conditional 
probability mass function of X, given that Y = y, is defined for all y such that 
P{Y = y} > 0, by 


p(x, y) 
py) 


Pyjy(xly) = PIX =x|Y=y}= 


It is therefore natural to define, in this case, the conditional expectation of X given 
that Y = y, for all values of y such that py (y) > 0, by 


EIX|Y=y] = )) xP =x1¥=y} 


>, PxIVGLY) 


Example 5a 

If X¥ and Y are independent binomial random variables with identical parameters n 
and p, calculate the conditional expected value of X given that X + Y = m. 
Solution 


Let us first calculate the conditional probability mass function of X given that 
X+Y=~m. Fork < min(n,m), 


P{X=k,X+Y=m} 
P{IX+Y =m) 


P{X =k|X+Y =m} 


P{X=k,Y=m-k} 
P{X+Y =m} 


P{X = K}P{Y = m—k} 
P{X+Y=m)} 


(7) pe(1—p)” * (i ¥) pe t=py 
(") p™1—p)"™™ 


where we have used the fact (see Example 3f of Chapter6 )thatx+Yis 
a binomial random variable with parameters 2n and p. Hence, the conditional 
distribution of X, given that X + Y = m, is the hypergeometric distribution, and 
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from Example 2g __, we obtain 


m 
E[X|X+Y¥=m] => 


Similarly, let us recall that if X and Y are jointly continuous with a joint probability 
density function f(x,y), then the conditional probability density of X, given that Y = y, 
is defined for all values of y such that f(y) > 0 by 


f(%y) 
fYO) 


f yyy @1y) = 


It is natural, in this case, to define the conditional expectation of X, given that Y = y, 
by 


E[X|Y = y] -| Xf yy (xy) dx 


provided that f,(y) > 0. 


Example 5b 
Suppose that the joint density of X and Y is given by 
e */Ve-y 


Ly) 0<x< ~, 0<y< ow 


Compute E[X|Y = y]. 


Solution 


We start by computing the conditional density 
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f(y) 
fy) 


f(x,y) 
| r(x y) dx 
(1/y)e"*/%e-¥ 


| aiy)e*re dx 
0 


(1/y)e"*/¥ 


{ aye? dx 
0 


= 1 o-xsy 
y 


fgy@ly) 


Hence, the conditional distribution of X, given that Y = y, is just the exponential 
distribution with mean y. Thus, 


x 
E[X|Y¥ =y] - | ina 
0 


Remark Just as conditional probabilities satisfy all of the properties of ordinary 
probabilities, so do conditional expectations satisfy the properties of ordinary 
expectations. For instance, such formulas as 


> g(X) Pyiy(% ly) in the discrete case 
x 


Elg(X)|¥ =y] = 4 px 
| g(X) fxjy Oly) dx inthe continuous case 


and 


n n 
FE) > Xil¥=y|= >. ELLY =yI 


remain valid. As a matter of fact, conditional expectation given that Y = y can be 
thought of as being an ordinary expectation on a reduced sample space consisting 
only of outcomes for which Y = y. 


7.5.2 Computing expectations by conditioning 


Let us denote by E[X | Y] that function of the random variable Y whose value at Y = y 
is E[X|Y = y]. Note that E[X | Y] is itself a random variable. An extremely important 
property of conditional expectations is given by the following proposition. 


Proposition 5.1 The conditional expectation formula 


(5.1) 
E[X] = E[E[X|¥]] 


If Y is a discrete random variable, then Equation (5.1) — states that 
(5.1a) 


E[X] = ) ELXIY =yIP(Y = 9} 
y 
whereas if Y is continuous with density 70), then Equation (5.1) — states 
(5.1b) 


E[X] - | E[X|Y = y] fy) dy 


We now give a proof of Equation (5.1) in the case where X and Y are both 
discrete random variables. 


Proof of Equation (5.1) When X and Y Are Discrete: We must show that 
(5.2) 


E[X] = ) ELXIY = yIP(Y = 9} 
y 


Now, the right-hand side of Equation (5.2) | canbe written as 
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>. EIXIY = y|P{Y = y} 


y 


>, >, xPK = x1¥ = y}PY =y} 
y x 

oy, ee ee Y oy = yj 
» > Paar ay 

y x 

>) PX =n¥ =a} 

x oy 

> xP = x) 


x 


= E[X| 
and the result is proved. 


One way to understand Equation (5.2) _ is to interpret it as follows: To calculate 
E|X], we may take a weighted average of the conditional expected value of X given 
that Y = y, each of the terms E[X | Y = y] being weighted by the probability of the 
event on which it is conditioned. (Of what does this remind you?) This is an 
extremely useful result that often enables us to compute expectations easily by first 
conditioning on some appropriate random variable. The following examples illustrate 
its use. 


Example 5c 


A miner is trapped in a mine containing 3 doors. The first door leads to a tunnel 
that will take him to safety after 3 hours of travel. The second door leads to a 
tunnel that will return him to the mine after 5 hours of travel. The third door leads 
to a tunnel that will return him to the mine after 7 hours. If we assume that the 
miner is at all times equally likely to choose any one of the doors, what is the 
expected length of time until he reaches safety? 


Solution 


Let X denote the amount of time (in hours) until the miner reaches safety, and let 
Y denote the door he initially chooses. Now, 


E[X] = E[X|Y = 1]P{y = 1} + E[X|Y = 2]P{y = 2} 
+E[X|Y = 3]P{Y = 3} 


1 
= 3(E[X|Y = 1]+Z[XIY = 2]+E[X|¥ = 3) 


However, 
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(5.3) 
E[X|Y = 1]=3 
E[X|Y = 2]=5+E[X] 
E[X|Y = 3])=7+E[X] 


To understand why Equation (5.3) is correct, consider, for instance, 

E|X|Y = 2] and reason as follows: If the miner chooses the second door, he 
spends 5 hours in the tunnel and then returns to his cell. But once he returns to 
his cell, the problem is as before; thus, his expected additional time until safety is 
just E[X]. Hence,E[X|Y = 2] = 5 + E|X|. The argument behind the other 
equalities in Equation (5.3) is similar. Hence, 


E[X] = me +5 +E[X]+7+ E[X]) 


or 


E[xX] =15 


Example 5d Expectation of a sum of a random number of random variables 


Suppose that the number of people entering a department store on a given day is 
a random variable with mean 50. Suppose further that the amounts of money 
spent by these customers are independent random variables having a common 
mean of $8. Finally, suppose also that the amount of money spent by a customer 
is also independent of the total number of customers who enter the store. What is 
the expected amount of money spent in the store on a given day? 


Solution 


If we let N denote the number of customers who enter the store and X; the 


amount spent by the ith such customer, then the total amount of money spent 
N 


can be expressed as >, X;. Now, 


i=1 


N N 
E > x =E|E ay 
1 1 


But 


N n 
E)) X|N=n E)) X|N=n 
1 1 


n 
E > x by the independence of the X; and N 
1 


nE|X| where E[X]| = E|X;| 


which implies that 
N 
E > XN = NE[X| 
1 
Thus, 


N 
E > X, | = E[NE[X]] = EINJELX] 


Hence, in our example, the expected amount of money spent in the store is 
50 x $8, or $400. 


Example 5e 


The game of craps is begun by rolling an ordinary pair of dice. If the sum of the 
dice is 2, 3, or 12, the player loses. If it is 7 or 11, the player wins. If it is any other 
number i, the player continues to roll the dice until the sum is either 7 or i. If itis 
7, the player loses; if it is i, the player wins. Let R denote the number of rolls of 
the dice in a game of craps. Find -5pt 


a. E|R]; 

b. E[R| player wins]; 

c. E|R| player loses]. 
Solution 


If we let P; denote the probability that the sum of the dice is i, then 


i—1 


ot a rw 


To compute E|R], we condition on S, the initial sum, giving 
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However, 


1, if i = 2,3,7,11,12 


EIR|S =i] = 
| | otherwise 


P, +P,’ 


The preceding equation follows because if the sum is a value i that does not end 
the game, then the dice will continue to be rolled until the sum is either i or 7, and 
the number of rolls until this occurs is a geometric random variable with 
parameter P; + P,. Therefore, 


6 P 10 Pp 
E[R =1+) +) 
i=4 i=8s8 


1+ 2(3/9+4/10 + 5/11) = 3.376 


To determine E[R| win |, let us start by determining p, the probability that the 
player wins. Conditioning on S yields 


12 


pa P{win |S = OP, 


as) 

Il 
Il 
N 


H 
> 
“I 
ct 
> 
me 
a 
ae 
a 
I[M© 
> 
ou 


H 
So 
aN 
Ne} 
ic®) 


where the preceding uses the fact that the probability of obtaining a sum of i 
before one of 7 is P; /(P; + P,). Now, let us determine the conditional probability 
mass function of S, given that the player wins. Letting Q, = P{S = i| win}, we 
have 


07 = 0,= 0.5 = 9, Q, = P-7/P, Q4, =Pi1/P 


and, for i = 4,5, 6, 8,9, 10, 
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P{S = i, win} 

t P{win} 
P;P{win |S = i} 
p 
p?, 
p(P; + Pz) 


Now, conditioning on the initial sum gives 


E[R| win ] =D pale, S = i]Q, 


However, as was noted in Example 2) of Chapter6 __, given that the initial 
sum is i, the number of additional rolls needed and the outcome (whether a win 
or a loss) are independent. (This is easily seen by first noting that conditional on 
an initial sum of i, the outcome is independent of the number of additional dice 
rolls needed and then using the symmetry property of independence, which 
states that if event A is independent of event B, then event B is independent of 
event A.) Therefore, 


E[R|win] = >. #IRIS = iQ, 


fh >) 
P+ P, ap 


i=4 


2.938 


Although we could determine E|R| player loses] exactly as we did 
E|R| player wins], it is easier to use 


E|R| = E[R| win |p + E[R| lose |(1 — p) 


implying that 


E[R] — E[R| win |p 
1—p 


E[R| lose ] = = 3.801 


Example 5f 


As defined in Example 5d = of Chapter 6 __, the bivariate normal joint density 
function of the random variables X and Y is 


2 2 
1 1 x—p,\? (¥~H, 
x,y) = ———————= _ exp }- =~ || —] + 
fey) 200g Gya) 1 =p? >| 201 =p?) ( Ox ( Oy 


Ce bw d(y = 1, ) 


Oxy 


We will now show that p is the correlation between X and Y. As shown in 
Example 5¢ , u,, = E[X],oz = Var (X), and p, = E[Y],o5 = Var (¥). 
Consequently, 


Cov (X,Y) 


Ox0y 


E|XY] — why 
Oxy 


Corr (X,Y) 


To determine E|XY]|, we condition on Y. That is, we use the identity 


E[XY] = E[E[XY |Y]] 


Recalling from Example 5d _ __ that the conditional distribution of X given that 


Oo 
Y = y is normal with mean pw, + p —y — H,,), we see that 
y 


E[XY|Y 


y] = ElxXy|Y¥ = y| 
yE[X|Y = y] 


Ox 
Ye + PCO My) 


Ox 2 
YH, + p(y" — byy) 
y 
Consequently, 


= Ox y2_ 
E|XY|Y] = Yu, +p —(Y" — wyY) 
y 


implying that 
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E[XY] ElYu. +p —=(¥2—y.Y 
Hy, +p —( MyY) 
y 


= wE[Y]+ ox pry? = Y] 
=. ips er My 


Oo 
= wb, +p— (E[Y’] — uy) 
Oy 
Ox 
= Myly + p = Var (Y) 
= Byby + p0xdy 
Therefore, 


POxOy _ 


Corr (X,Y) = 
OxOy 


Sometimes E|X] is easy to compute, and we use the conditioning identity to compute 
a conditional expected value. This approach is illustrated by our next example. 


Example 5g 


Consider n independent trials, each of which results in one of the outcomes 
k 

1, ..., k, with respective probabilities p,, ..., Bid p; = 1. Let N; denote the 
i=1 


number of trials that result in outcome i,i = 1,...,k. For i # j, find 
(a) E|N;|N; > 0| and (b) E|N;|N; > 1| 


Solution 


To solve (a), let 


Then 


E|N,] = E[N,;|1 = 0]P{7 = 0} + E[N,|1 = 1]P{ = 1} 
or, equivalently, 

E|N;| = E|N;|N; = 0|P{N; = 0} + E[N;|N; > O|P{N; > 0} 
Now, the unconditional distribution of N; is binomial with parameters n, Dj. Also, 


given that N; = r, each of the n — r trials that does not result in outcome i will, 
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Pj 
_ D; 
Consequently, the conditional distribution of N ;, given that N; = r, is binomial with 


independently, result in outcome j with probability P(j| not i) = I 


P; 


parameters n — 1, . (For a more detailed argument for this conclusion, see 


i 
Example 4c of Chapter6  .) Because P{N; = 0} = (1- p,) the preceding 
equation yields 


Pj : n 
mp, =n -(1—p,)" + EIN |Ni> 011-19") 


i 
giving the result 


1-(1-p,)"* 


E|N;|N; > 0] = np, i=(7)" 
i 


We can solve part (b) in a similar manner. Let 


0, ifN;=0 
jJ=41, ifN;=1 
2, ifN,;>1 


Then 


E[N;| = E[N;|J = o]P{ = 0} + B[N,|J = 1]PUJ = 1 
+ E[N;|J = 2|P{J = 2} 


or, equivalently, 


E|N;| = E|N;|N; = 0|P{N; = 0} + E[N;|N; = 1|P{N; = 1} 
+ E|N;|N; > 1|P{N; > 1} 


This equation yields 


mene” (1—p,)"+(n-1) if 


+ EIN; [Nj i 1](4 — (1 — p;)" — np (1 — p)” ’) 


np (1 == pi)” * 


giving the result 


np,|1-(1—p,)" *-(—1)p,—p)” ?| 


E(N;|N; >1]= = 
ane Lm(l=py—apti=py 


It is also possible to obtain the variance of a random variable by conditioning. We 
illustrate this approach by the following example. 

Example 5h Variance of the geometric distribution 

Independent trials, each resulting in a success with probability p, are 

successively performed. Let N be the time of the first success. Find Var (N). 

Solution 

Let Y = 1 if the first trial results in a success and Y = 0 otherwise. Now, 

Var (N) = E[N?] — (E[N])° 

To calculate E|N*|, we condition on Y as follows: 


E[N*] = e[E[N’|¥]] 


However, 


These two equations follow because, on the one hand, if the first trial results in a 
success, then, clearly,V = 1; thus,N? = 1. On the other hand, if the first trial 
results in a failure, then the total number of trials necessary for the first success 
will have the same distribution as 1 (the first trial that results in failure) plus the 
necessary number of additional trials. Since the latter quantity has the same 
distribution as N, we obtain E[N?|Y = 0] = E[(1 +N) |Hence, 


E[N?| E[N’?|Y = 1]P{y = 1} + E[N*|Y = o|P{y = 0} 


p+(1—p)é[(1+N)’] 
1+(1—p)E[2N + N?| 


However, as was shown in Example 8b of Chapter4  ,E{N| = 1/p; 
therefore, 


or 
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E[N*] = —5 

Consequently, 
Var (N) = E[N?| — (E[N])? 
58-6) 
p? p 

_ i-p 

er 
Example 5i 


Consider a gambling situation in which there are r players, with player i initially 
having n; units, n; > 0, i = 1, ..., r. At each stage, two of the players are chosen 
to play a game, with the winner of the game receiving 1 unit from the loser. Any 


player whose fortune drops to 0 is eliminated, and this continues until a single 
rT 


player has alln = n; units, with that player designated as the victor. 


ot 
Assuming that the results of successive games are independent and that each 
game is equally likely to be won by either of its two players, find the average 
number of stages until one of the players has all n units. 


Solution 


To find the expected number of stages played, suppose first that there are only 2 
players, with players 1 and 2 initially having j and n — j units, respectively. Let Xx, 
denote the number of stages that will be played, and let m; = E|X,|.Then, for 
j=1,...n—-1, 


Xj =1+A4; 


where A; is the additional number of stages needed beyond the first stage. 
Taking expectations gives 


m; = 1+ E|A;| 


Conditioning on the result of the first stage then yields 
m =1+ E|A; | 1 wins first stage 2 + E|A; | 2 wins first stage ]1/2 
Now, if player 1 wins at the first stage, then the situation from that point on is 


exactly the same as in a problem that supposes that player 7 starts with j + 1 
and player 2 with n — ( j + 1) units. Consequently, 
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E|A;|1 wins first stage | = Mj+1 
and, analogously, 
E|A;|2 wins first stage | = mMj-4 


Thus, 


1 
mj=1+ts mits m-1 


or, equivalently, 


(5.4) 


Mj+1 = 2m; as oe 2, j=1,..n—-1 


Using that my = 0, the preceding equation yields 


M, = 2m,—-2 
Mz; = 2m,—-—m,—-—2=3m,—-—6=3(m,-2) 
mM, = 2mz—-—m,—2=4m, —12 =4(m, —-3) 


suggesting that 


(5.5) 


m; = i(m, —i+1), a ee (3 


To prove the preceding equality, we use mathematical induction. Since we’ve 
already shown the equation to be true for i = 1, 2, we take as the induction 
hypothesis that it is true whenever i < j < n. Now we must prove that it is true for 
j +1. Using Equation (5.4) yields 


Mj+1 = 2m; = Mj-141 —2 
= 2j(m,—-j+1)-U-1)(m, -—j+2)-—2 (bythe induction hypothesis ) 
= (j+1)m,—2j?+2j+j*-37+2-2 
= (§+1)m,-j?-j 
= (G+1)(m, - j) 
which completes the induction proof of (5.5) .Lettingi=nin(5.5) ,and 
using that m,, = 0, now yields that 


m,=n-1 


541 of 848 


which, again using (5.5) —_, gives the result 


m; = i(n — i) 


Thus, the mean number of games played when there are only 2 players with 
initial amounts i and n — i is the product of their initial amounts. Because both 
players play all stages, this is also the mean number of stages involving player 1. 


Now let us return to the problem involving r players with initial amounts n,,i = 1, 
rT 

a n; =n. Let X denote the number of stages needed to obtain a victor, and 
i=1 


let X; denote the number of stages involving player i. Now, from the point of view 
of player i, starting with n;, he will continue to play stages, independently being 
equally likely to win or lose each one, until his fortune is either n or 0. Thus, the 
number of stages he plays is exactly the same as when he has a single opponent 
with an initial fortune of n — n;. Consequently, by the preceding result, it follows 
that 


E[X;] = n(n — nj) 


so 


r 


Tr Tr 
E ». Xi) = > n(n — nj) =n? — > a: 
—F — 


i=1 


But because each stage involves two players, 


r 
1 
i=1 


Taking expectations now yields 


It is interesting to note that while our argument shows that the mean number of 
stages does not depend on the manner in which the teams are selected at each 
stage, the same is not true for the distribution of the number of stages. To see 
this, suppose r = 3, n; = nz = 1, and n, = 2. If players 7 and 2 are chosen in 
the first stage, then it will take at least three stages to determine a winner, 
whereas if player 3 is in the first stage, then it is possible for there to be only two 


stages. 


In our next example, we use conditioning to verify a result previously noted in 
Section 6.3.1: that the expected number of uniform (0,1) random variables that 
need to be added for their sum to exceed 1 is equal to e. 


Example 5j 
Let U,,U2,... be a sequence of independent uniform (0, 1) random variables. Find 
E|N| when 
n 
N=min 4n: ze U;>1 
(=% 
Solution 


We will find E[N] by obtaining a more general result. For x € [0,1], let 


n 


N(x) = minjn: » U;>x 
i=1 
and set 


m(x) = E[N(X)] 


That is,N(x) is the number of uniform (0, 1) random variables we must add until 
their sum exceeds x, and m(x) is its expected value. We will now derive an 
equation for m(x) by conditioning on U,. This gives, from Equation (5.1b) _, 


(5.6) 


1 
m(x) -| E[N(X)|U, = y| dy 
0 


1 ify>x 


E[N(X)|U, = y] | auees if y<x 


The preceding formula is obviously true when y > x. It is also true when y < x, 
since, if the first uniform value is y, then, at that point, the remaining number of 
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uniform random variables needed is the same as if we were just starting and 
were going to add uniform random variables until their sum exceeded x — y. 
Substituting Equation (5.7) into Equation (5.6) gives 


m(x) 


x 
+] m(x—y) dy 
0 


x 
by lettin 
+] m(u)du e 
u=x-y 
0 


Differentiating the preceding equation yields 


m' (x) = m(X) 


or, equivalently, 


m! (x) _ 


m(x) 


Integrating this equation gives 


log|m(X)] = x+c 


or 


m(x) = ke* 


Since m(0) = 1, it follows that k = 1, so we obtain 


m(x) = e* 


Therefore,m(1), the expected number of uniform (0, 1) random variables that 
need to be added until their sum exceeds 1, is equal to e. 


7.5.3 Computing probabilities by conditioning 


Not only can we obtain expectations by first conditioning on an appropriate random 
variable, but we can also use this approach to compute probabilities. To see this, let 
A denote an arbitrary event, and define the indicator random variable X by 


_ 1 if A occurs 
0 if A doesnot occur 
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It follows from the definition of X that 


E[X] 
E[X|Y 


P(A) 
y|] = P(A|Y=y)_ forany random variable Y 


Therefore, from Equations (5.1a )and(5.1b) _ , we obtain 


(5.8) 


P(A) = > PA |Y=y)P(Y=y)_ if Y isdiscrete 
y 
= | P(A | Y=y)f,O)dy if Y iscontinuous 


Note that if Y is a discrete random variable taking on one of the values y,, ..., y,,, then 
by defining the events B;,i = 1,...,n, by B; = {Y = y,}, Equation (5.8) reduces to 
the familiar equation 


n 


P(A)= ) P(A|Bi)P(Bi) 


f=4 
where B,,...,B, are mutually exclusive events whose union is the sample space. 


Example 5k The best-prize problem 


Suppose that we are to be presented with n distinct prizes, in sequence. After 
being presented with a prize, we must immediately decide whether to accept it or 
to reject it and consider the next prize. The only information we are given when 
deciding whether to accept a prize is the relative rank of that prize compared to 
ones already seen. That is, for instance, when the fifth prize is presented, we 
learn how it compares with the four prizes we’ve already seen. Suppose that 
once a prize is rejected, it is lost, and that our objective is to maximize the 
probability of obtaining the best prize. Assuming that all n! orderings of the prizes 
are equally likely, how well can we do? 


Solution 


Rather surprisingly, we can do quite well. To see this, fix a value k,0 < k < n, and 
consider the strategy that rejects the first k prizes and then accepts the first one 
that is better than all of those first k. Let P;,( best ) denote the probability that the 
best prize is selected when this strategy is employed. To compute this probability, 
condition on X, the position of the best prize. This gives 


P,( best ) = = P,( best |X = DP(X =) 
i=1 


n 
1 > 
=a P;,( best |X = i) 


i=1 


Now, on the one hand, if the overall best prize is among the first k, then no prize 
is ever selected under the strategy considered. That is, 


P,(best |X =i)=0 ifi<k 


On the other hand, if the best prize is in position i, where i > k, then the best 
prize will be selected if the best of the first i— 1 prizes is among the first k(for 
then none of the prizes in positions k + 1,k + 2,...,i— 1 would be selected). But, 
conditional on the best prize being in position i, it is easy to verify that all possible 
orderings of the other prizes remain equally likely, which implies that each of the 
first i— 1 prizes is equally likely to be the best of that batch. Hence, we have 


P,( best |X i) = P{best of first i— 1 isamong the first k|X = i} 


if i>k 


From the preceding, we obtain 


EOE Hae iA 


i=k +1 
n 
k 1 d 
~ 7 x—-1 . 
k+1 
= uP n-1 
7 ne k 
7 oe (i) 
~ og k 


Now, if we consider the function 
= Foe (=) 
g(x) = — log | > 


then 
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so 


'@) = 0 10g (Z)=1 x= 7 
=U> —-J/= 1.79 x=- 
g (x og \> x : 


Thus, since P;( best ) ~ g(k), we see that the best strategy of the type 
considered is to let the first n/e prizes go by and then accept the first one to 
appear that is better than all of those. In addition, since g(n/e) = 1/e, the 
probability that this strategy selects the best prize is approximately 1/e ~ .36788 


Remark Most people are quite surprised by the size of the probability of 
obtaining the best prize, thinking that this probability would be close to 0 when n 
is large. However, even without going through the calculations, a little thought 
reveals that the probability of obtaining the best prize can be made reasonably 
large. Consider the strategy of letting half of the prizes go by and then selecting 
the first one to appear that is better than all of those. The probability that a prize 
is actually selected is the probability that the overall best is among the second 


half, and this is 7 In addition, given that a prize is selected, at the time of 
selection that prize would have been the best of more than n/2 prizes to have 
appeared and would thus have probability of at least ; of being the overall best. 
Hence, the strategy of letting the first half of all prizes go by and then accepting 
the first one that is better than all of those prizes has a probability greater than ; 


of obtaining the best prize. 


Example 5l 


Let U be a uniform random variable on (0, 1), and suppose that the conditional 
distribution of X, given that U = p, is binomial with parameters n and p. Find the 
probability mass function of xX. 


Solution 


Conditioning on the value of U gives 
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P{X 


1 
i}= | P{X =i|U = p}f,@) dp 
0 


1 
[ m= =niep 
0 


1 
n! : : 
L U _ n-t 
0 
Now, it can be shown (a probabilistic proof is given in Section 6.6) __ that 


; itn CO! 
p'(1—p) B= aay 
0 


Hence, we obtain 


1 
PIX =i} = i=0,..,n 


That is, we obtain the surprising result that if a coin whose probability of coming 
up heads is uniformly distributed over (0, 1) is flipped n times, then the number of 
heads occurring is equally likely to be any of the values 0, ...,n. 


Because the preceding conditional distribution has such a nice form, it is worth 
trying to find another argument to enhance our intuition as to why such a result is 
true. To do so, let U,U;,...,U;, be n + 1 independent uniform (0, 1) random 
variables, and let X denote the number of the random variables U,,...,U,, that are 
smaller than U.Since all the random variables U, U;, ..., U, have the same 
distribution, it follows that U is equally likely to be the smallest, or second 
smallest, or largest of them; so X is equally likely to be any of the values 0, 1, ..., n. 
However, given that U = p, the number of the U; that are less than U is a binomial 
random variable with parameters n and p, thus establishing our previous result. 


Example 5m 


A random sample of X balls is chosen from an urn that contains n red and m blue 
balls. If X is equally likely to be any of the values 1, ...,n, find the probability that 
all the balls in the sample are red. 


Solution 


Conditioning on X yields 
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n 
P(allballs arered ) = ». P( all balls are red |X = i)P(X = i) 
i=1 


n+m 
Now, given that the sample is of size i, each of the ( ; subsets of size i is 
l 


equally likely to be the chosen set of balls. As (") of these subsets have all red 
l 
n 
() 
nt+m 
oe 
() 
n . 
l 


1 
P(all balls are red) = 7 >» TEES 
= / : 

i 


However, though not obvious, it turns out that the preceding can be simplified, 
and indeed yields the surprising result that 


balls, it follows that P{all balls are red|X = i} = and thus that 


1 
P( all balls are red ) = Barer for all n,m 


To prove the preceding formula, we will not make use of our earlier result, but 
rather we will use induction on n. When n = 1, the urn contains 1 red and m blue 


1 
balls and so a random sample of size 1 will be red with probability ie So, 


assume the result is true whenever the urn contains n — 1 red and m blue balls 
and a random sample whose size is equally likely to be any of 1,..... — 1 is to be 
chosen. Now consider the case of n red and m blue balls. Start by conditioning 
not on the value of X but only on whether or not X = 1. This yields 


P(allballsarered) = P(allred |X = 1)P(X =1) + P(allred |X > 1)P(X > 1) 


n 1 n-1 
= — + P(allred |X > 1 }—— 
n+mn n 


Now, if X > 1 then in order for all balls in the sample to be red, the first one 


chosen must be red, which occurs with probability , and then all of the 


n+m 
X — 1 remaining balls in the sample must be red. But given that the first ball 
chosen is red, the remaining X — 1 balls will be randomly selected from an urn 
containing n — 1 red and m blue balls. As X — 1, given that X > 1, is equally likely 
to be any of the values 1, ...,n — 1, it follows by the induction hypothesis that 


1 


P(allballs are red |X > 1) = Sra ane | 


Thus, 
1 n 1 n-1 
P(allballsarered) = 
n+m n+mm+i1i n 
_ 1 , n-1 
7 area eT 
= 1 
~— mti1 
Example 5n 


Suppose that X and Y are independent continuous random variables having 
densities f, and f,,, respectively. Compute P{X < Y}. 


Solution 


Conditioning on the value of Y yields 


PIX <Y} = | P{X <Y|Y=y}f, &) dy 

= | P{X <y|Y=y}fy 0) dy 

= P{X <y}f, (y}dy _ byindependence 

= Fy(y) fy ) dy 
where 

y 
Fy) = | fy(x) dx 

Example 50 


Suppose that X and Y are independent continuous random variables. Find the 
distribution function and the density function of X + Y. 


Solution 
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By conditioning on the value of Y, we obtain 


P{X+Y <a} 


| P{IX+Y <alY=y}f,0) dy 


P{X+y<alY¥ =y}f,0) dy 


Fxy(a—y) fy) dy 


| P{X <a—y} fy) dy 


Differentiation yields the density function of X + Y: 


(oe) 


d 
£233) = da Fy(a—y) fy (dy 


— © 


al 
| qa fx —y) fy Wdy 


| f,(a-y) fy ONdy 


7.5.4 Conditional variance 


Just as we have defined the conditional expectation of X given the value of Y, we can 
also define the conditional variance of X given that Y = y: 


Var (X|Y) = E[(X — E[X|Y])7|¥] 


That is, Var (X | Y) is equal to the (conditional) expected square of the difference 
between X and its (conditional) mean when the value of Y is given. In other words, 
Var (X | Y) is exactly analogous to the usual definition of variance, but now all 
expectations are conditional on the fact that Y is known. 


There is a very useful relationship between Var (X), the unconditional variance of xX, 
and Var (X|Y), the conditional variance of X given Y, that can often be applied to 
compute Var (X). To obtain this relationship, note first that by the same reasoning 
that yields Var (X) = E[X?| = (E[x| stretchy='false’)* , we have 
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var (X|Y) = E[X?|¥] - (E[x|¥])° 


SO 


(5.9) 


E[e[x?|¥]] — E[(E[X|¥])"] 


E[x?] — e[(E[x|¥])"] 


E| Var (X|Y)| 


Also, since E[E[X|Y]| = E[X], we have 


(5.10) 
Var (E[X|¥]) = E[(E[X|¥])*] — Lx)? 
Hence, by adding Equations (5.9) and(5.10) _ , we arrive at the following 
proposition. 
Proposition 5.2 The Conditional Variance Formula 


Var (X) = E| Var (X|Y)] + Var (E[X|Y]) 


Example 5p 


Suppose that by any time t the number of people who have arrived at a train 
depot is a Poisson random variable with mean At. If the initial train arrives at the 
depot at a time (independent of when the passengers arrive) that is uniformly 
distributed over (0,7), what are the mean and variance of the number of 
passengers who enter the train? 


Solution 


For each t > 0, let N(t) denote the number of arrivals by t, and let Y denote the 
time at which the train arrives. The random variable of interest is then N(Y). 
Conditioning on Y gives 


E[N(Y)|Y=t] = E[N()|Y =¢] 
= E|N(t)| by the independence of Y and N(t) 


= At since N(t) is Poisson with mean At 


Hence, 


E[N(Y)|Y] = aY 


so taking expectations gives 
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AT 
E(N(Y)] = AEIY] = > 


To obtain Var(N(Y)), we use the conditional variance formula: 


Var(N(Y)|Y = t) 


Var(N(t)|Y = t) 


Var|N(t)| by independence 


= At 
Thus, 
Var (N(Y)|Y) = AY 
E[N(Y)|Y] = ay 


Hence, from the conditional variance formula, 


Var (N(Y)) 


E[AY] + Var (AY) 


ge 
2 12 


where we have used the fact that Var (Y) = 77/12. 


Example 5q Variance of a sum of a random number of random variables 


Let X,,X2,... be a sequence of independent and identically distributed random 


variables, and let N be a nonnegative integer-valued random variable that is 
N 


independent of the sequence X;,i = 1. To compute Var os X; |, we condition 
Las 


on N: 


E ba X;|N| = NE[X] 


N 
Var 3 X,|N 


N Var (X) 


N 
The preceding result follows because, given N, 3 X; is just the sum of a fixed 
p= 


number of independent random variables, so its expectation and variance are 
just the sums of the individual means and variances, respectively. Hence, from 
the conditional variance formula, 
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N 
v( > X;) = E[N] Var «| + (E[X])2 Var (N) 
i=1 


7.6 Conditional Expectation and Prediction 


Sometimes a situation arises in which the value of a random variable X is observed 
and then, on the basis of the observed value, an attempt is made to predict the value 
of a second random variable Y. Let g(X) denote the predictor; that is, if X is observed 
to equal x, then g(x) is our prediction for the value of Y. Clearly, we would like to 
choose g so that g(X) tends to be close to Y. One possible criterion for closeness is 
to choose g so as to minimize E[(Y — g(X))”]. We now show that, under this 
criterion, the best possible predictor of Y is g(X) = E[Y |X]. 


Proposition 6.1 
E[(Y — g(X))*] = E[(Y — EY |X])’] 
Proof 


(6.1) 


E[(Y — g(X))?|X] = El’ - E[Y|X]+ E[¥|X] - g(X))*|X] 
= E[(Y - £[Y|X])*|X] 
+E[(E[Y |X] — 9(X))*|X] 


+2E[(Y — E[Y |X])(E[¥ |X] — g(X)) |X] 


However, given X, E|Y |X| — g(X), being a function of X, can be treated as a 
constant. Thus, 


(6.2) 

E[(Y — E[Y |X] (E[Y|4] — g(X)) 14] 
(E[Y |X] — g(X))ELY — E[Y|X]|X] 
= (ElY|X] — 9(X))(EIY[X]-El¥|X]) 
0 


Hence, from Equations (6.1) and(6.2 _ ), we obtain 


E[(Y — g(X))"|X] = E[(Y - E[Y |X) |X] 


554 of 848 


and the desired result follows by taking expectations of both sides of the 
preceding expression. 


Remark A second, more intuitive, although less rigorous, argument verifying 
Proposition 6.1 __ is as follows: It is straightforward to verify that E[(Y — c)*] is 
minimized at c = E[Y]. (See Theoretical Exercise 1.) Thus, if we want to predict 
the value of Y when there are no data available to use, the best possible 
prediction, in the sense of minimizing the mean square error, is to predict that Y 
will equal its mean. However, if the value of the random variable X is observed to 
be x, then the prediction problem remains exactly as in the previous (no-data) 
case, with the exception that all probabilities and expectations are now 
conditional on the event that X = x. Hence, the best prediction in this situation is 
to predict that Y will equal its conditional expected value given that X = x, thus 
establishing Proposition 6.1 


Example 6a 


Suppose that the son of a man of height x(in inches) attains a height that is 
normally distributed with mean x + 1 and variance 4. What is the best prediction 
of the height at full growth of the son of a man who is 6 feet tall? 


Solution 


Formally, this model can be written as 


Y=X+1+e 


where e is a normal random variable, independent of X, having mean 0 and 
variance 4. The X and Y, of course, represent the heights of the man and his son, 
respectively. The best prediction E[Y | X = 72] is thus equal to 


E[Y|X = 72] =£[X+1+e|X=72] 
= 73+Ele|X =72] 
= 73+E(e) by independence 
= 73 


Example 6b 


Suppose that if a signal value s is sent from location A, then the signal value 
received at location B is normally distributed with parameters (s, 1). If S, the value 
of the signal sent at A, is normally distributed with parameters (1,07), what is the 
best estimate of the signal sent if R, the value received at B, is equal to r? 


Solution 


Let us start by computing the conditional density of S given R. We have 


fon 
FW) 
f (DF gis 19) 
Xo) 


Re-G-" (207 3-@-s)"/2 


feja(SI7) 


where K does not depend on s. Now, 


2 2 
See oe (sa +35)-(Gtr)ste 


202 2 


where C does not depend on s. Thus, we may conclude that the conditional 
distribution of S, the signal sent, given that r is received, is normal with mean and 
variance now given by 


uU+ro 
E\S|R = = 
[S| " 1+0? 
Var (S|R = = = 
a = 2) 1+ 0? 


Consequently, from Proposition 6.1 __, given that the value received is r, the 
best estimate, in the sense of minimizing the mean square error, for the signal 
sent is 


o2 


E[S|R =r] = ——, 4+ — 
[S| a fic2 14e2 


r 


Writing the conditional mean as we did previously is informative, for it shows that 
it equals a weighted average of , the a priori expected value of the signal, and r, 
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the value received. The relative weights given to uw and r are in the same 
proportion to each other as 1 (the conditional variance of the received signal 
when s is sent) is to (the variance of the signal to be sent). 


Example 6c 


In digital signal processing, raw continuous analog data X must be quantized, or 
discretized, in order to obtain a digital representation. In order to quantize the raw 
data X, an increasing set of numbers a;,i = 0, +1, +2,..., such that 


lm aj= © and lim = a;= — o is fixed, and the raw data are then 
iD P00 i> — © 


quantized according to the interval (a;, a;.,| in which X lies. Let us denote by Y; 
the discretized value when X € (a;,a;,]|, and let Y denote the observed 
discretized value—that is, 


Y=y, 


5 if a.<X Saj44 


The distribution of Y is given by 
PLY = ys = Fx (aj+1) — Fx (aj) 
Suppose now that we want to choose the values y,,i = 0, + 1, +2,...so as to 


minimize E[(X — Y)*], the expected mean square difference between the raw 
data and their quantized version. 


a. Find the optimal values y,,i = 0, + 1,.... 
For the optimal quantizer Y, show that 

b. E|Y] = E[X], so the mean square error quantizer preserves the input 
mean; 

c. Var (Y) = Var (X) — E[(X —Y)’]. 


Solution 


(a) For any quantizer Y, upon conditioning on the value of Y, we obtain 


E[(X-¥)?|= ) BLK —y,)* la < XS ay Pla <X < ais) 


L 


Now, if we let 


IT=i if aj <X Saj41 


then 
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B(x -y,)*]ar<X < ap+s] = E[(X-y,) [I= 
and by Proposition 6.1 __, this quantity is minimized when 


i = EX =i] 


[ 
E[X|a, < X < a;4,] 
7 oe as, dx 
7 - Fy (Qi+1) — Fx(@i) 


Now, since the optimal quantizer is given by Y = E[X |/], it follows that 


s 
= 
— 
Ps 
WY 
Il 


E[Var (X|1D] + Var (E[X|J]) 
E[E[(X — Y)?|1]] + Var (Y) 


— 
i?) 
a 

II 


E[(X — Y)?] + Var (Y) 


It sometimes happens that the joint probability distribution of X and Y is not 
completely known; or if it is known, it is such that the calculation of E[Y |X = x] is 
mathematically intractable. If, however, the means and variances of X and Y and the 
correlation of X and Y are known, then we can at least determine the best /inear 
predictor of Y with respect to X. 


To obtain the best linear predictor of Y with respect to X, we need to choose a and b 
so as to minimize E[(Y — (a + bX))*]. Now, 


E|(Y (aa. bX))°| E[Y? — 2aY — 2bXY + a? + 2abX + b?X?| 


E|Y?] — 2aE[Y] — 2bE[XY] + a? 
+2abE[X] + b*E|x?| 

Taking partial derivatives, we obtain 

(6.3) 


—2E|Y] + 2a + 2bE[X] 


< E|(v¥ -a- bx)”| 


a E|(¥—a-bX)"| = -2E[XY] + 2aB[X] + 2bE[X?| 


Setting Equations (6.3) — to 0 and solving for a and b yields the solutions 
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(6.4) 
E[XY] — E[X]E[Y] _ Cov(X,¥) __ ay 
E[X2]-(EIX)? ORCC 


_ poyE [X] 


a = &[Y]—be[x] = s[y]-— 


where p = Correlation (X,Y), of = Var (Y), and of = Var (X). It is easy to verify that 
the values of a and b from Equation (6.4) minimize E[(Y — a — bX)”]; thus, the 
best (in the sense of mean square error) linear predictor Y with respect to X is 


poy 
My + aa =.) 


where pL, = Ely| and pw, = E|X]. 
The mean square error of this predictor is given by 
(6.5) 


7 (rn, -e2(e-n,)) 


Ox 


= 6{(v—1,)'] +0? ella ~1,)"]- 202 f(r — 1, 0-1) 


= 63 + p*0;3 — 2p?0% 
= oy(1 — p*) 
We note from Equation (6.5) __ that if p is near +1 or —1, then the mean square 
error of the best linear predictor is near zero. 
Example 6d 


An example in which the conditional expectation of Y given X is linear in X, and 

hence in which the best linear predictor of Y with respect to X is the best overall 
predictor, is when X and Y have a bivariate normal distribution. For, as shown in 
Example 5d of Chapter6 _, in that case, 


Os 
E[Y|X =x] =n, +p—(*— H,) 
y Ox 


7.f Moment Generating Functions 


The moment generating function M(t) of the random variable X is defined for all real 
values of t by 


M(t) =Ele™| 


3 e™p(x) if X is discrete with mass function p(x) 


x 


| ef(x)dx if X is continuous with density f (x) 


We call M(t) the moment generating function because all of the moments of X can 
be obtained by successively differentiating M(t) and then evaluating the result at 
t = 0. For example, 


(7.1) 


M'(t) 


| 

| a 

by 

fav) 
oo, 


where we have assumed that the interchange of the differentiation and expectation 
operators is legitimate. That is, we have assumed that 


d 
aD <0 


= > tex) 
whade © 


x 


in the discrete case and 


c ef (X) dx] = #1 tex) dx 
dt | at 


in the continuous case. This assumption can almost always be justified and, indeed, 
is valid for all of the distributions considered in this book. Hence, from Equation 
(7.1) , evaluated at t = 0, we obtain 


M'(0) = E[X] 


Similarly, 
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M " (t) 


4 way 
dt 
= © Bixe™) 
= E xe")| 
=" Exe] 
Thus, 
M" (0) = E[Xx?| 
In general, the n th derivative of M(t) is given by 
M"(t) = E[X"e*X] n>1 
implying that 
M"(0) = E[X™] n>1 
We now compute M(t) for some common distributions. 


Example 7a Binomial distribution with parameters n and p 


If X is a binomial random variable with parameters n and p, then 


M(t) = Ele] 
= en(?) oka —p)"* 
k =0 
= > (7) (pe')“(1—p)” * 
k =0 


= (pet+1—p)" 


where the last equality follows from the binomial theorem. Differentiation yields 
M'(t) =n(pe'+1—p)” ‘pet 
Thus, 


E|[X] = M'(0) = np 
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Differentiating a second time yields 
M"(t) =n(n-1)(pe'+1-p)” “(pet)’ +n(pe'+1-p)” ‘pet 
so 
E[X?] = M"(0) = n(n—1)p? + np 
The variance of X is given by 


Var (X) 


E[X?] — (E[x])” 


n(n —1)p? + np — n?p? 


np(1—p) 
verifying the result obtained previously. 


Example 7b Poisson distribution with mean A 


If X is a Poisson random variable with parameter A, then 


M(t) = Ele*] 


II 
AN 
© 


exp{a(e’— 1)} 


Differentiation yields 


M'(t) Ae‘ exp{a(e’ — 1)} 


(ae*)* exp{a(e® — 1)} + Aet exp{a(et — 1)} 


M"(t) 


Thus, 


E[X] = M'(0)=a 
E[x?] = M"(0) =A? +a 
E[x?] — (E[X])° 
= 


Var (X) 
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Hence, both the mean and the variance of the Poisson random variable equal A. 


Example 7c Exponential distribution with parameter A 


[e*| 


= [erica 
0 

if e (A-t)x dy 
0 


A 
= 


M(t) 


H 
by 


fort <A 


SS 


We note from this derivation that for the exponential distribution, M(t) is defined 
only for values of t less than A. Differentiation of M(t) yields 


Ul = A " = 2A 
Gaye A Gas 


Hence, 


E[X] = M'(0) = 7 E[X?] = M"(0) = 4 


The variance of X is given by 


Var (X) = E[X?| - (E[X])? 


2 
Example 7d Normal distribution 


We first compute the moment generating function of a standard normal random 
variable with parameters 0 and 1. Letting Z be such a random variable, we have 
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M(t) = Ele| 


1 (x—t)? ¢? F 
i — —;dx 
V20 2 
= ef /2 e-(®-t) /2 gy 
V2 
= gt /2 


Hence, the moment generating function of the standard normal random variable 
Z is given by M,(t) = et*/2, To obtain the moment generating function of an 


arbitrary normal random variable, we recall (see Section 5.4 )thatX¥ =yw+o0Z 
will have a normal distribution with parameters and a” whenever Z is a 
standard normal random variable. Hence, the moment generating function of 
such a random variable is given by 


E[e*] 


= Elet+e2)| 


My(t) 


= Flee] 
= ere | 
= eM,(to) 


~ pttip(to) /2 


ie 
= exp os + ut 


By differentiating, we obtain 


ot" 
M'y(t) = (wt+to?) exp] - ue} 
Pere Aaee 
M"y(t) = (ut ta)? exp] aia ue} + a? exp] a! ue} 


Thus, 


563 of 848 


E|X] = M0) =u 


E[X?| M"(0) =y?2+0? 


implying that 


Var(X) = E[Xx?]—£([x])’ 


= 0 


Tables 7.1. and7.2 (on page 364) give the moment generating functions for 
some common discrete and continuous distributions. 


Table 7.1 Discrete probability distribution. 


Probability mass Moment Mean | Variance 
function, p(x) generating 
function, M(t) 
Binomial n\ . n-x (peé+1-—p)” ap np(1 — p) 
p*(1—p) i 1 re 2 
with x ry ae 
parameters 
n, p; 
O<p<1 
Poisson a A* exp{A(e’— 1)} A A 
with x! 
parameter x=0,1,2,... 
1>0 
Geometric | p(1—p)* * pet 1 l=) 
: 1-—(1-p)eé p p? 
with ee! 
parameter 
O<p<1 
Negative mL nor pet P[k THI?) 
_, }p"a-p) ——| |, |= 
binomial if L= (l= pie p 
with n=r,rt+1,.. 
parameters 
r. D: 
Table 7.2 Continuous probability distribution. 
Probability density function, f(x) Mean 
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generating 
function, M(t) 
Uniform ere et? — elt ath 
over (a, b) iQ) hee 2) : 
0 otherwise 
Exponential (ne 220 a4. - 
f(x) = = 
7 7 A-t a 
with x< 
parameter 
1>0 
Gamma de~?*(Ax)>* - ( A ) S 
oe ee eu 
with f(x) = r(s) A-t ‘i 
parameters 0 x<0 
(s,1I),1>0 
242 Ll 
onal ixai= eaten) (20? -0o <x< o exps ut + 
with ene : 


An important property of moment generating functions is that the moment generating 
function of the sum of independent random variables equals the product of the 
individual moment generating functions. To prove this, suppose that X and Y are 
independent and have moment generating functions My(t) and My(t), respectively. 
Then My+,(t), the moment generating function of X + Y, is given by 


Elet*+¥)] 
Elet%e’”] 
Ele*]E[e"] 


= My(t)My(t) 


My+y(t) 


where the next-to-last equality follows from Proposition 4.1 _, since X and Y are 
independent. 


Another important result is that the moment generating function uniquely determines 
the distribution. That is, if My(t) exists and is finite in some region about t = 0, then 
the distribution of X is uniquely determined. For instance, if 


Mx(t) = (5) (e' +1)”, 


then it follows from Table 7.1 that X is a binomial random variable with 


1 
parameters 10 and 3 


Example 7e 


Suppose that the moment generating function of a random variable X is given by 
M(t) = e3("-0), What is P{X = 0}? 


Solution 


We see from Table 7.1 __ that M(t) = e3(¢'~1) is the moment generating 
function of a Poisson random variable with mean 3. Hence, by the one-to-one 
correspondence between moment generating functions and distribution 
functions, it follows that X must be a Poisson random variable with mean 3. Thus, 
P{X = 0} =e73, 

Example 7f Sums of independent binomial random variables 

If X and Y are independent binomial random variables with parameters (n, p) and 
(m, p), respectively, what is the distribution of X + Y? 


Solution 


The moment generating function of X + Y is given by 


My+y(t) =My(t)My(t) = (peé+1—p)"(pet'+1-p)” 


(peé+1—p)™™” 


However, (pe’ + 1 — yp)" is the moment generating function of a binomial 
random variable having parameters m + n and p. Thus, this must be the 
distribution of X + Y. 


Example 7g Sums of independent poisson random variables 
Calculate the distribution of X + Y when X and Y are independent Poisson 
random variables with means respectively 2, and A. 

Solution 

Mx(t)My(t) 

exp{A, (e* — 1)} exp{a, (ef — 1)} 

exp{(A, + Az)(e* — 1)} 


My+y(t) 


Hence, X + Y is Poisson distributed with mean A, + A2, verifying the result given 
in Example 3e of Chapter 6 
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Example 7h Sums of independent normal random variables 


Show that if X and Y are independent normal random variables with respective 
parameters (w,,07) and (u,,03), then X + Y is normal with mean uw, + u, and 
variance of + 0%. 


Solution 


Mx(t)My(t) 
Cre Gat 
exp a ea ae hy exp a ge 


a2 + 02)t? 
exp SET SDE =F (u, + u,)¢} 


My+y(t) 


which is the moment generating function of a normal random variable with mean 
u, +, and variance of + of. The desired result then follows because the 
moment generating function uniquely determines the distribution. 


Example 7i 


Compute the moment generating function of a chi-squared random variable with 
n degrees of freedom. 


Solution 


We can represent such a random variable as 


Zitet+Z? 


where Z,,...,Z, are independent standard normal random variables. Let M(t) be 
its moment generating function. Then, by the preceding, 


M(t) = (éle"]) 


where Z is a standard normal random variable. Now, 


1 
Ele'?”| = al et eX" /2 dy 
V2n 


ead -1 
1 
a | e*'/20" dy where a? = (: — 2 
1 


= 0 
-1/2 


(1 — 2t) 


where the next-to-last equality uses the fact that the normal density with mean 0 
and variance ao? integrates to 1. Therefore, 


M(t) =(1-2t)°"”” 


Example 7j Moment generating function of the sum of a random number of 
random variables 


Let X,,X2,... be a sequence of independent and identically distributed random 
variables, and let N be a nonnegative, integer-valued random variable that is 
independent of the sequence X,i => 1. We want to compute the moment 
generating function of 


(In Example 5d ___, Y was interpreted as the amount of money spent in a store 
on a given day when both the amount spent by a customer and the number of 
customers are random variables.) 


To compute the moment generating function of Y, we first condition on N as 


follows: 
N N 
E| exp t) X N=n| = E|exp t) X N=n 
1 1 
N 
= E|exp t) X 
1 
= [M,(t)]” 
where 
My(t) = Ele] 
Hence, 
N 
Ele™|N] = (Mx(t)) 
Thus, 


My(t) = E|(Mx(t))"| 
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The moments of Y can now be obtained upon differentiation, as follows: 
My(t) = E[N(My(t))” “Mx(t)] 
So 


(7.2) 
E[y] = M’y(0) 
= E|N(M,(0))" *M',(0)| 
= E|NE|X]] 
= E|NIE|X| 


verifying the result of Example 5d __. (In this last set of equalities, we have used 
the fact that My (0) = E[e°*] = 1.) 


Also, 
M"y(t) = E[N(N — 1)(M,(t))" 2(M'y)? + N(Mx(t))" *M"x(6)| 


so 


(7.3) 


E|y?| M"y(0) 

= E[N(N —1)(E[X])° + NE[X?| 

= (E[X])’(E[N?] — B[N]) + E[N]E[X?] 

= E(N](E[X?] - (E[x])”) + (E[X])E[N?] 


= E[N]Var(X) + (E[X])*E[N?] 


Hence, from Equations (7.2) and(7.3_ ), we have 


Var(Y) E[N]Var(X) + (E[X])°(£[N *|-( [N])”) 


E[N]Var(X) + (E[X])?Var(N) 


Example 7k 


Let Y denote a uniform random variable on (0, 1), and suppose that conditional 
on Y = p, the random variable X has a binomial distribution with parameters n 
and p.In Example 5k _ , we showed that X is equally likely to take on any of the 
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values 0,1, ...,n. Establish this result by using moment generating functions. 


Solution 


To compute the moment generating function of X, start by conditioning on the 
value of Y. Using the formula for the binomial moment generating function gives 


Ele™|Y = p] = (peo +1—p)" 


Now, Y is uniform on (0, 1), so, upon taking expectations, we obtain 


E[e**] 


1 
| (peé'+1—p)" dp 
0 


t 


e 

1 

= | y"dy (by the substitution y = pe’ + 1 — p) 
1 


ef—1 


1 et(nt1)_4 
te. eh. 


1 
= ar ne be eee”) 


Because the preceding is the moment generating function of a random variable 
that is equally likely to be any of the values 0, 1, ...,n, the desired result follows 
from the fact that the moment generating function of a random variable uniquely 
determines its distribution. 


7.7.1 Joint moment generating functions 


It is also possible to define the joint moment generating function of two or more 
random variables. This is done as follows: For any n random variables X;, ...,X,,, the 
joint moment generating function, M(t;, ...,t,,), is defined, for all real values of 

bi, stag DY 


M(ty, Prey, tx) = Ele%1+ eee +tnXn] 


The individual moment generating functions can be obtained from M(ty, ...,t,) by 
letting all but one of the t;‘s be 0. That is, 


My, (t) = Ele] = M(0,...,0,t, 0, ..., 0) 


where the t is in the ith place. 
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It can be proven (although the proof is too advanced for this text) that the joint 
moment generating function M(t,, ...,t,) uniquely determines the joint distribution of 
X41, +.Xy-. This result can then be used to prove that the n random variables X;,,..., Xn 
are independent if and only if 


(7.4) 
M(ty, 4 t,) = My, (t1)---My,, (tn) 


For the proof in one direction, if the n random variables are independent, then 


MU Eiji Bp) Ele(t¥1 + tee tinkn)| 


= Elett*1...enXn] 
= Efe*1]..-E[e'n*n| by independence 


= Mx, (t)Mx, (tn) 


For the proof in the other direction, if Equation (7.4 _) is satisfied, then the joint 
moment generating function M(t, ...,t,,) is the same as the joint moment generating 
function of n independent random variables, the ith of which has the same 
distribution as X;. As the joint moment generating function uniquely determines the 
joint distribution, this must be the joint distribution; hence, the random variables are 
independent. 


Example 71 


Let X and Y be independent normal random variables, each with mean yu and 
variance o*.In Example 7a  ofChapter6  , we showed that X + Y and X —Y 
are independent. Let us now establish that X + Y and X — Y are independent by 
computing their joint moment generating function: 


plese = Bless! 


Ele(t+s)*]p] e(¢-s)| 


gl(t+s)+o2(t+s) /2pu(t—s)+o%(t—s)”/2 


242 G2o2 
e2htta*t*,o*s 


But we recognize the preceding as the joint moment generating function of the 
sum of a normal random variable with mean 2u and variance 207 and an 
independent normal random variable with mean 0 and variance 207. Because the 
joint moment generating function uniquely determines the joint distribution, it 
follows that X + Y and X — Y are independent normal random variables. 
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In the next example, we use the joint moment generating function to verify a result 
that was established in Example 2b _ of Chapter 6 


Example 7m 


Suppose that the number of events that occur is a Poisson random variable with 
mean A and that each event is independently counted with probability p. Show 
that the number of counted events and the number of uncounted events are 
independent Poisson random variables with respective means Ap and A(1 — p). 


Solution 


Let X denote the total number of events, and let X, denote the number of them 
that are counted. To compute the joint moment generating function of X,, the 
number of events that are counted, and X — X,, the number that are uncounted, 
start by conditioning on X to obtain 


Eleske+(x—Xe)|x = n| 


eel els Xe|x = n| 


e™(pes* tf p)" 


(pes + (1—p)e*)" 


which follows because, conditional on X = n, X, is a binomial random variable 
with parameters n and p. Hence, 


Elese+(X—Xe) |x] = (pes =f (1 _ pet)” 


Taking expectations of both sides of this equation yields 
Xx 
Eleske+t(X—Xe)| _ E|(pe® 4 (1 = p)e*) | 
Now, since X is Poisson with mean A, it follows that Ele**| = er(e’—1), Therefore, 
for any positive value a we see (by letting a = e*) that E|a*| = e4(2-1) Thus, 


Eles*e pee) — pA(pes+(1—p)et-1) 


_ pavle>1)ga(1—p)(e*1) 


As the preceding is the joint moment generating function of independent Poisson 
random variables with respective means Ap and A(1 — p), the result is proven. 
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7.8 Additional Properties of Normal Random 
Variables 


7.8.1 The multivariate normal distribution 


Let Z,,....Z, be a set of n independent standard normal random variables. If, for 
some constants a;,1 <i<m1<j<n,andu,1<is<m, 


Xqy = AZ, te + AynlZn + HW, 
Xp = AgyZ, ++ + AgnZn + My 
Xi = Qi1Z4 ++ + GinZn + H,; 


Xm = AniZ, t+ AmnZn t+ Uy 


then the random variables X,, ...,X,, are said to have a multivariate normal 
distribution. 


From the fact that the sum of independent normal random variables is itself a normal 
random variable, it follows that each X; is a normal random variable with mean and 
variance given, respectively, by 


ny 

Le } 
= 
os 
— 

| 

[4a 

Q 
Rats) 


Let us now consider 


M(t, «. tm) = Elexp {t,X, +++ + tmXm}] 


the joint moment generating function of X;, ..., X;,. The first thing to note is that since 


m 

». t,X; is itself a linear combination of the independent normal random variables 
=—4 
Z4,+Zn, itis also normally distributed. Its mean and variance are 


m 


m 
E > t;Xj; = yy til; 
i=1 i 


= 1 


and 


= 
I [M48 
et 
& 
I I 
G) 
I [M48 Q 
fia 
Ame 
a 
[M48 
wre 
os 


Now, if Y is a normal random variable with mean yu and variance o?, then 
E[e’] = My(®)|;=1 = Ce 
Thus, 
m 


1 m 
M(ty, tm) = exp > tip) + a >. 
i= 


i=1 


m 

> tit ;Cov(X;,X;) 
which shows that the joint distribution of X;, ...,.X,, is completely determined from a 
knowledge of the values of E[X;] and Cov(X;,X;),i,j = 1,..,m. 


It can be shown that when m = 2, the multivariate normal distribution reduces to the 
bivariate normal. 


Example 8a 


Find P(X < Y) for bivariate normal random variables X and Y having parameters 
Hu, = E[X], py = E[Y],o2 = Var(X), 07 = Var(Y), p = Corr(X, Y) 


Solution 


Because X — Y is normal with mean 
elx-Y]=4,-1 
and variance 


Var(X — Y) 


Var(X) + Var(—Y) + 2Cov(X, — Y) 


= o2+ OF — 290x0y 


we obtain 
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P{xX <Y} = P{x-Y<o} 
= (u, = Hy) e — (4, Z Hy) 
o2+ OF — 290%0y o2+ Oy — 290x0y 
Hy, — 
= © a 


[oz + oF — 200x0y 


Example 8b 


Suppose that the conditional distribution of X, given that © = 6, is normal with 
mean @ and variance 1. Moreover, suppose that 0 itself is a normal random 
variable with mean yu and variance o. Find the conditional distribution of © given 
that X = x. 


Solution 


Rather than using and then simplifying Bayes’s formula, we will solve this 
problem by first showing that X,0 has a bivariate normal distribution. To do so, 
note that the joint density function of X,@ can be written as 


Fee 6) = f yjq(Xl®) fq (9) 


where f yjq(18) is a normal density with mean @ and variance 1.However, if we 


let Z be a standard normal random variable that is independent of 0, then the 
conditional distribution of Z + 0, given that © = @, is also normal with mean @ and 
variance 1.Consequently, the joint density of Z + 0,0 is the same as that of X,0. 
Because the former joint density is clearly bivariate normal (since Z + © and 0 
are both linear combinations of the independent normal random variables Z and 
@), it follows that X, @ has a bivariate normal distribution. Now, 


E|x] 
Var(X) = Var(Z+0)=1+0? 


E[Z+ 0] =u 


and 
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Corr(X, 0) 


co 
Il 


Corr(Z + 9, 0) 
Cov(Z + 0,0) 


 Var(Z + 0)Var(O) 


oO 


VvV14+02 


Because X, 0 has a bivariate normal distribution, the conditional distribution of 0, 
given that X = x, is normal with mean 


Blox =x] = #(0] +p |r (x— eX) 
= utr ew) 
and variance 
Var(O|X =x) = Var(@)(1 — p?) 
~ 1+e2 


7.8.2 The joint distribution of the sample mean and sample 
variance 


Let X,,...,.X,, be independent normal random variables, each with mean yu and 
n 


variance ao”. Let X = > X;/n denote their sample mean. Since the sum of 

c=% 
independent normal random variables is also a normal random variable, it follows 
that X is a normal random variable with (from Examples 2c and 4a_) expected 
value w and variance a? /n. 


Now, recall from Example 4e that 


(8.1) 
Cov(X,X,;-X)=0, i=1,..,n 


Also, note that since X,X, — X,X, —X,...,X, — X are all linear combinations of the 
independent standard normals (X; — )/o,i = 1,...,n, it follows that 

X,X, — X,i = 1,...,n has a joint distribution that is multivariate normal. If we let Y be a 
normal random variable, with mean yu and variance o7/n, that is independent of the 


X;,i = 1,...,n, then Y, X; — X,i = 1,...,n also has a multivariate normal distribution and, 
indeed, because of Equation (8.1) , has the same expected values and 
covariances as the random variables X,X; — X,i = 1,...,n. But since a multivariate 
normal distribution is determined completely by its expected values and covariances, 
it follows that Y,X; — X,i=1,..,n and X,X; — X,i = 1,...,n have the same joint 
distribution, thus showing that X is independent of the sequence of deviations 
X¥,-Xi= 1,050. 


Since X is independent of the sequence of deviations X; — X,i = 1,...,n, itis also 
independent of the sample variance 


st= D(X -X)7/(n- 2). 


i=1 


Since we already know that X is normal with mean yw and variance o?/n, it remains 
only to determine the distribution of S*. To accomplish this, recall, from Example 
4a __, the algebraic identity 


(n—1)S? 


| 
[X48 
n=, 
OS 
| 
a) 
N 


>, &i- w= n= p)’ 


i 
Upon dividing the preceding equation by o, we obtain 


(8.2) 


(n—-1)S? (X-p\? 
a Ged ~ 2 
Now, 


is the sum of the squares of n independent standard normal random variables and so 
is a chi-squared random variable with n degrees of freedom. Hence, from Example 


7i __, its moment generating function is (1 — 2 


Kany" 
(ci) 


. Also, because 
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is the square of a standard normal variable, it is a chi-squared random variable with 


—-1/2 
/ . Now, we 


1 degree of freedom, and so has moment generating function (1 — 2t) 
have seen previously that the two random variables on the left side of Equation 
(8.2) | are independent. Hence, as the moment generating function of the sum of 
independent random variables is equal to the product of their individual moment 


generating functions, we have 
B[etm—2)5*/07](1 — ae)” = (1-20) 
or 


Elet(n—1)s?/0?) = (1 _ 2)" 


But as (1 — gy is the moment generating function of a chi-squared random 
variable with n — 1 degrees of freedom, we can conclude, since the moment 
generating function uniquely determines the distribution of the random variable, that 
that is the distribution of (n — 1)S?/o?. 


Summing up, we have shown the following. 


Proposition 8.1 


If X4, ....X,, are independent and identically distributed normal random variables 
with mean py and variance o?, then the sample mean X and the sample variance 
S? are independent.X is a normal random variable with mean py and variance 
o?/n;(n—1)S*/o? is a chi-squared random variable with n — 1 degrees of 


freedom. 


7.9 General Definition of Expectation 


Up to this point, we have defined expectations only for discrete and continuous 
random variables. However, there also exist random variables that are neither 
discrete nor continuous, and they, too, may possess an expectation. As an example 
of such a random variable, let X be a Bernoulli random variable with parameter 


1 
P=>5: and let Y be a uniformly distributed random variable over the interval [0, 1]. 
Furthermore, suppose that X and Y are independent, and define the new random 


variable W by 


X ifX=1 
Y ifX #1 


Clearly, W is neither a discrete (since its set of possible values, [0, 1], is 


1 
uncountable) nor a continuous (since P{W = 1} = >) random variable. 


In order to define the expectation of an arbitrary random variable, we require the 
notion of a Stieltjes integral. Before defining this integral, let us recall that for any 
function g, f2g(X) dx is defined by 


b n 
| g(X) dx = lim > Ge er=Xia4) 


where the limit is taken over all a = xy < x1 < X29"? <x, =basn- oo and where 
max (x; —x;_-1) 7 0. 
i=1,..,n 


For any distribution function F, we define the Stieltjes integral of the nonnegative 
function g over the interval [a, b] by 


b n 
| g(X) dF(X) =lim g(a) [F Ge) — FI 


where, as before, the limit is taken over all a= x) < x1 <++* <x, =basn- o and 


where max (x — x1] — 0. Further, we define the Stieltjes integral over the 


i=1,..,n 


whole real line by 


00 b 
| g(x) dF(x) = lim | g(x) dF(x) 


b> +0 


Finally, if g is not a nonnegative function, we define g* and g” by 


g(x) ifg(x)=0 
g(x) = 
0 ifg(x)<0 
7 _ 0 if g(x) =0 
oy. re if g(x) <0 


Because g(x) = g*(x) — g(x) and g* and g~ are both nonnegative functions, it is 
natural to define 
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[ aearoo= | wore - | g (x) dF(x) 


(oe) 


and we say that [~ g(X) dF(X) exists as long as [°.. g*(x) dF(x) and 
i g(x) dF(x) are not both equal to + «. 


If X is an arbitrary random variable having cumulative distribution F, we define the 
expected value of X by 


(9.1) 


E[X] - | x dF (x) 
It can be shown that if X is a discrete random variable with mass function p(x), then 


| xdF(x) = >. xp(x) 
—o x:p(x)> 0 


whereas if X is a continuous random variable with density function f(x), then 


| sare = | xf (x) dx 


The reader should note that Equation (9.1) _ yields an intuitive definition of E[X]; 
consider the approximating sum 


xi [F (x1) — F(%;-1)] 


[4a 


of E|X|. Because F(x;) — F(x;_,) is just the probability that X will be in the interval 
(x;-1,x;], the approximating sum multiplies the approximate value of X when it is in 
the interval (x;—,,x;] by the probability that it will be in that interval and then sums 
over all the intervals. Clearly, as these intervals get smaller and smaller in length, we 
obtain the “expected value” of X. 


Stieltjes integrals are mainly of theoretical interest because they yield a compact way 
of defining and dealing with the properties of expectation. For instance, the use of 
Stieltjes integrals avoids the necessity of having to give separate statements and 
proofs of theorems for the continuous and the discrete cases. However, their 
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properties are very much the same as those of ordinary integrals, and all of the 
proofs presented in this chapter can easily be translated into proofs in the general 
case. 


Summary 


If X and Y have a joint probability mass function p(x, y), then 


Elg® Y= >> gy») 
y x 
whereas if they have a joint density function f(x, y), then 
E|g(X,Y)] = | | g(x, y)f(% y) dx dy 


A consequence of the preceding equations is that 


E[X + Y] = E[X] + E[Y] 


which generalizes to 


The covariance between random variables X and Y is given by 


Cov(X,Y) = E[(X— E[x])(¥ — Ely])] 
E[XY] — E[X]E[Y] 


A useful identity is 
m 
>. Cov(X;, Y;) 


When n= m and Y; = X;,i = 1,...,n, the preceding formula gives 
n 


Var 3 X; |= > Var(X)) +2). > Cov(X,¥) 
i=1 i j 


i=1 
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The correlation between X and Y, denoted by p(X,Y), is defined by 


Cov (X,Y) 


en y Var(X) Var(Y) 


If X and Y are jointly discrete random variables, then the conditional expected value 
of X, given that Y = y, is defined by 


ELXY=yl= ) aP(X = x|¥ =] 


x 


If X and Y are jointly continuous random variables, then 
E[X|Y =y] -| xf ply XL) 


where 


f(xy) 


f xjy@ly) = f(y) 


is the conditional probability density of X given that Y = y. Conditional expectations, 
which are similar to ordinary expectations except that all probabilities are now 
computed conditional on the event that Y = y, satisfy all the properties of ordinary 
expectations. 


Let E[X | Y] denote that function of Y whose value at Y = yis E[X|Y = y]. A very 
useful identity is 


E[X] = E[E[X|¥]] 
In the case of discrete random variables, this equation reduces to the identity 


E[X] = ) ELX|Y = y|P(Y = y} 
y 
and, in the continuous case, to 
E[X] - | E[X|Y = ylf, Ody 


The preceding equations can often be applied to obtain E[X] by first “conditioning” on 
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the value of some other random variable Y. In addition, since, for any event A, 
P(A) = E[I,], where I, is 1 if A occurs and is 0 otherwise, we can use the same 
equations to compute probabilities. 


The conditional variance of X, given that Y = y, is defined by 


Var(X|Y = y) = E[(X - E[X|¥ =y])"|Y¥ =] 


Let Var(X | Y) be that function of Y whose value at Y = y is Var(X|Y = y). The 
following is known as the conditional variance formula: 


Var(X) = E[ Var(X|Y)|] + Var(E[X|Y]) 


Suppose that the random variable X is to be observed and, on the basis of its value, 
one must then predict the value of the random variable Y. In such a situation, it turns 
out that among all predictors, E[Y |X] has the smallest expectation of the square of 
the difference between it and Y. 


The moment generating function of the random variable X is defined by 


M(t) = Ele") 


The moments of X can be obtained by successively differentiating M(t) and then 
evaluating the resulting quantity at t = 0. Specifically, we have 


n 


d 
E[X"] = Fa MO t=0 n=1,2,... 


Two useful results concerning moment generating functions are, first, that the 
moment generating function uniquely determines the distribution function of the 
random variable and, second, that the moment generating function of the sum of 
independent random variables is equal to the product of their moment generating 
functions. These results lead to simple proofs that the sum of independent normal 
(Poisson, gamma) random variables remains a normal (Poisson, gamma) random 
variable. 


If X1,....Xm are all linear combinations of a finite set of independent standard normal 
random variables, then they are said to have a multivariate normal distribution. Their 
joint distribution is specified by the values of E|X;],Cov(X;, X;), i j = 1,...,.m. 


If X,,...,X, are independent and identically distributed normal random variables, then 
their sample mean 


and their sample variance 


n 


gee yy (x; -X)’ 


n-1 


are independent. The sample mean _X is a normal random variable with mean u and 
variance o7/n; the random variable (n — 1)S*/a? is a chi-squared random variable 
with n — 1 degrees of freedom. 


Problems 


7.1 A player throws a fair die and simultaneously flips a fair coin. If 
the coin lands heads, then she wins twice, and if tails, then she wins 
one-half of the value that appears on the die. Determine her 
expected winnings. 
7.2 The game of Clue involves 6 suspects, 6 weapons, and 9 rooms. 
One of each is randomly chosen and the object of the game is to 
guess the chosen three. 
a. How many solutions are possible? 
In one version of the game, the selection is made and then 
each of the players is randomly given three of the remaining 
cards. Let S, W, and R be, respectively, the numbers of 
suspects, weapons, and rooms in the set of three cards given 
to a specified player. Also, let X denote the number of 
solutions that are possible after that player observes his or her 
three cards. 
b. Express X in terms of S, W, and R. 
c. Find E[X]. 


7.3 Gambles are independent, and each one results in the player 
being equally likely to win or lose 1 unit. Let W denote the net 
winnings of a gambler whose strategy is to stop gambling 
immediately after his first win. Find 

a. P{W > 0} 

b. P{W <0} 

c. E[W] 
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7.4. If X and Y have joint density function 
1/y, fO<y <1 0< ee y 


fyy@y) = 
MY 0, otherwise 


7.5 The county hospital is located at the center of a square whose 
sides are 3 miles wide. If an accident occurs within this square, then 
the hospital sends out an ambulance. The road network is 
rectangular, so the travel distance from the hospital, whose 
coordinates are (0, 0), to the point (x, y) is |x| + |y|. If an accident 
occurs at a point that is uniformly distributed in the square, find the 
expected travel distance of the ambulance. 
7.6 A fair die is rolled 10 times. Calculate the expected sum of the 10 
rolls. 
7.7 Suppose that A and B each randomly and independently choose 
3 of 10 objects. Find the expected number of objects 

a. chosen by both A and B; 

b. not chosen by either A or B; 

c. chosen by exactly one of A and B. 


7.8. N people arrive separately to a professional dinner. Upon arrival, 
each person looks to see if he or she has any friends among those 
present. That person then sits either at the table of a friend or at an 
unoccupied table if none of those present is a friend. Assuming that 


N 
each of the (*) pairs of people is, independently, a pair of friends 


with probability p, find the expected number of occupied tables. 
Hint: Let X; equal 1 or 0, depending on whether the ith arrival sits at 
a previously unoccupied table. 
7.9. A total of n balls, numbered 1 through n, are put into n urns, also 
numbered 1 through n in such a way that ball i is equally likely to go 
into any of the urns 1, 2,..., i. Find 

a. the expected number of urns that are empty; 

b. the probability that none of the urns is empty. 


7.10 Consider 3 trials, each having the same probability of success. 
Let X denote the total number of successes in these trials. If 
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E|X] = 1.8, what is 
a. the largest possible value of P{X = 3}? 
b. the smallest possible value of P{X = 3}? 


In both cases, construct a probability scenario that results in 
P{X = 3} having the stated value. 
Hint: For part (b), you might start by letting U be a uniform random 


variable on (0, 1) and then defining the trials in terms of the value of 
U 


7.11 Consider n independent flips of a coin having probability p of 
landing on heads. Say that a changeover occurs whenever an 
outcome differs from the one preceding it. For instance, if nm = 5 and 
the outcome is HHTHT, then there are 3 changeovers. Find the 
expected number of changeovers. 
Hint: Express the number of changeovers as the sum of n — 1 
Bernoulli random variables. 
7.12. A group of n men and n women is lined up at random. 
a. Find the expected number of men who have a woman next to 
them. 
b. Repeat part (a), but now assuming that the group is randomly 
seated at a round table. 


7.13. A set of 1000 cards numbered 1 through 1000 is randomly 
distributed among 1000 people with each receiving one card. 
Compute the expected number of cards that are given to people 
whose age matches the number on the card. 

7.14. An urn has m black balls. At each stage, a black ball is 
removed and a new ball that is black with probability p and white with 
probability 1 — p is putin its place. Find the expected number of 
stages needed until there are no more black balls in the urn. 

NOTE: The preceding has possible applications to understanding the 
AIDS disease. Part of the body’s immune system consists of a 
certain class of cells known as T-cells. There are 2 types of T-cells, 
called CD4 and CD8. Now, while the total number of T-cells in AIDS 
sufferers is (at least in the early stages of the disease) the same as 
that in healthy individuals, it has recently been discovered that the 
mix of CD4 and CD8 T-cells is different. Roughly 60 percent of the 
T-cells of a healthy person are of the CD4 type, whereas the 
percentage of the T-cells that are of CD4 type appears to decrease 
continually in AIDS sufferers. A recent model proposes that the HIV 
virus (the virus that causes AIDS) attacks CD4 cells and that the 
body’s mechanism for replacing killed T-cells does not differentiate 
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between whether the killed T-cell was CD4 or CD8. Instead, it just 
produces a new T-cell that is CD4 with probability .6 and CD8 with 
probability .4. However, although this would seem to be a very 
efficient way of replacing killed T-cells when each one killed is 
equally likely to be any of the body’s T-cells (and thus has probability 
.6 of being CD4), it has dangerous consequences when facing a 
virus that targets only the CD4 T-cells. 

7.15. In Example 2h ___, say that i and j, i # j, form a matched pair if 
i chooses the hat belonging to j and j chooses the hat belonging to i. 
Find the expected number of matched pairs. 

7.16. Let Z be a standard normal random variable, and, for a fixed x, 
set 


Z ifzZ>x 
0 otherwise 


Show that E[X] = se 
V20 


7.17. A deck of n cards numbered 1 through 7 is thoroughly shuffled 
so that all possible n! orderings can be assumed to be equally likely. 
Suppose you are to make n guesses sequentially, where the ith one 
is a guess of the card in position i. Let N denote the number of 
correct guesses. 
a. If you are not given any information about your earlier 
guesses, show that for any strategy, E[N] = 1. 
b. Suppose that after each guess you are shown the card that 
was in the position in question. What do you think is the best 
strategy? Show that under this strategy, 


EIN] = ~—+——~+--+1 
n 


Q 
ss 
el PR 
Q 
eR 
ll 
o 
oq 
pa 


c. Suppose that you are told after each guess whether you are 
right or wrong. In this case, it can be shown that the strategy 
that maximizes E|N] is one that keeps on guessing the same 
card until you are told you are correct and then changes to a 
new card. For this strategy, show that 


1 
EIN) = 14+>4+>4-+4+— 


R 
fa) 
| 
ay 


Hint: For all parts, express N as the sum of indicator (that is, 
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Bernoulli) random variables. 

7.18. Cards from an ordinary deck of 52 playing cards are turned 
face up one ata time. If the 1st card is an ace, or the 2nd a deuce, or 
the 3rd a three, or ..., or the 13th a king, or the 14 an ace, and so on, 
we say that a match occurs. Note that we do not require that the ( 
13n + 1) card be any particular ace for a match to occur but only that 
it be an ace. Compute the expected number of matches that occur. 
7.19. A certain region is inhabited by r distinct types of a certain 
species of insect. Each insect caught will, independently of the types 
of the previous catches, be of type i with probability 


Ps 
P;,,i=1,..,7 yeaa 
1 


a. Compute the mean number of insects that are caught before 
the first type 1 catch. 

b. Compute the mean number of types of insects that are caught 
before the first type 1 catch. 


7.20 In an urn containing n balls, the ith ball has weight 

W (i), i = 1,...,n. The balls are removed without replacement, one at a 
time, according to the following rule: At each selection, the probability 
that a given ball in the urn is chosen is equal to its weight divided by 
the sum of the weights remaining in the urn. For instance, if at some 
time i,, ...,l, is the set of balls remaining in the urn, then the next 


- 
selection will be i; with probability wei) / > Whi) ea Lae 
k=1 

Compute the expected number of balls that are withdrawn before ball 
number 1 is removed. 
7.21. For a group of 100 people, compute 

a. the expected number of days of the year that are birthdays of 

exactly 3 people; 
b. the expected number of distinct birthdays. 


7.22. How many times would you expect to roll a fair die before all 6 
sides appeared at least once? 

7.23. Urn 1 contains 5 white and 6 black balls, while urn 2 contains 8 
white and 10 black balls. Two balls are randomly selected from urn 1 
and are put into urn 2. If 3 balls are then randomly selected from urn 
2, compute the expected number of white balls in the trio. 

Hint: Let X; = 1 if the ith white ball initially in urn 1 is one of the three 
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selected, and let X; = 0 otherwise. Similarly, let Y; = 1 if the ith white 
ball from urn 2 is one of the three selected, and let Y; = 0 otherwise. 
The number of white balls in the trio can now be written as 


5 8 
yt Dv 
1 1 
7.24. A bottle initially contains m large pills and n small pills. Each 
day, a patient randomly chooses one of the pills. If a small pill is 
chosen, then that pill is eaten. If a large pill is chosen, then the pill is 
broken in two; one part is returned to the bottle (and is now 
considered a small pill) and the other part is then eaten. 
a. Let X denote the number of small pills in the bottle after the 
last large pill has been chosen and its smaller half returned. 
Find E[X]. 
Hint: Define n + m indicator variables, one for each of the 
small pills initially present and one for each of the m small pills 
created when a large one is split in two. Now use the 
argument of Example 2m 
b. Let Y denote the day on which the last large pill is chosen. 
Find E[Y]. 
Hint: What is the relationship between X and Y? 


7.25. Let X,, X2,... be a sequence of independent and identically 


distributed continuous random variables. Let N > 2 be such that 
X, 2] Xy 2 > Xyuy < Xp 


That is, N is the point at which the sequence stops decreasing. Show 
that E[N] = e. 
Hint: First find P{N => n}. 
7.26. If X1,X2,...Xpn are independent and identically distributed 
random variables having uniform distributions over (0, 1), find 

a. E[max(Xj, .... Xn) ]; 

b. E[min(X,, «..X,,)]. 


* 7.27. \f 101 items are distributed among 10 boxes, then at least one 
of the boxes must contain more than 10 items. Use the probabilistic 
method to prove this result. 

* 7.28. The k-of-r-out-of-n circular reliability system, k <r <n, 
consists of n components that are arranged in a circular fashion. 
Each component is either functional or failed, and the system 
functions if there is no block of r consecutive components of which at 
least k are failed. Show that there is no way to arrange 47 
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components, 8 of which are failed, to make a functional 3-of-12-out- 
of-47 circular system. 
* 7.29. There are 4 different types of coupons, the first 2 of which 
comprise one group and the second 2 another group. Each new 
coupon obtained is type i with probability p,, where 
PD, =P, =1/8,p, =p, = 3/8. Find the expected number of coupons 
that one must obtain to have at least one of 

a. all 4 types; 

b. all the types of the first group; 

c. all the types of the second group; 

d. all the types of either group. 


* 7.30. If X and Y are independent and identically distributed with 
mean p and variance o?, find 
E(x —Y)’] 


7.31. In Problem 7.6 _ , calculate the variance of the sum of the 
rolls. 
7.32. In Problem 7.9 — , compute the variance of the number of 
empty urns. 
7.33. If E[X] = 1 and Var(X) = 5, find 

a. E[(2 + X)’]; 

b. Var(4 + 3X). 


7.34 If 10 married couples are randomly seated at a round table, 
compute (a) the expected number and (b) the variance of the number 
of wives who are seated next to their husbands. 
7.35 Cards from an ordinary deck are turned face up one at a time. 
Compute the expected number of cards that need to be turned face 
up in order to obtain 

a. 2 aces; 

b. 5 spades; 

c. all 13 hearts. 


7.36. Let X be the number of 1’s and Y the number of 2’s that occur 
in n rolls of a fair die. Compute Cov(X,Y). 

7.37. A die is rolled twice. Let X equal the sum of the outcomes, and 
let Y equal the first outcome minus the second. Compute Cov(X, Y). 
7.38. Suppose X and Y have the following joint probability mass 
function. 
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p(1,1) .10,p(1,2) = .12,p(1,3) = .16 
p(2, 1) 


p(3,1) 


.08, p(2, 2) = .12,p(2,3) = .10 


.06, p(3, 2) = .06,p(3,3) = .20 


a. Find E[X] and E[Y]. 

b. Find Var(X) and Var(Y). 

c. Find Cov(X,Y). 

d. Find the correlation between X and Y. 


7.39 Suppose that 2 balls are randomly removed from an urn 
containing n red and m blue balls. Let X; = 1 if the i*” ball removed is 
red, and let it be 0 otherwise, i = 1,2. 
a. Do you think that Cov(X,, X2) is negative, zero, or positive. 
b. Validate your answer to part (a). 
Suppose the red balls are numbered, and let Y; equal 1 if red 
ball number i is removed, and let it be 0 if that ball is not 
removed. 
c. Do you think that Cov(Y,, Y2) is negative, zero, or positive. 
d. Validate your answer to part (c). 


7.40. The random variables X and Y have a joint density function 
given by 
2e **/x O<x<w,0<sy<x 


2 aoa 


otherwise 


Compute Cov(X,Y). 

7.41. Let X,,... be independent with common mean pw and common 
variance ao”, and set Y, = X, + Xn+1+Xn+2. For j = 0, find 
Cov(Yn Yn+j)- 

7.42. The joint density function of X and Y is given by 


1 
fay) = pene x>0,y>0 


Find E[X], E[Y], and show that Cov(X,Y) = 1. 

7.43. A pond contains 100 fish, of which 30 are carp. If 20 fish are 
caught, what are the mean and variance of the number of carp 
among the 20? What assumptions are you making? 

7.44. A group of 20 people consisting of 10 men and 10 women is 
randomly arranged into 10 pairs of 2 each. Compute the expectation 
and variance of the number of pairs that consist of a man and a 
woman. Now suppose the 20 people consist of 10 married couples. 


Compute the mean and variance of the number of married couples 
that are paired together. 
7.45. Let X,,X2,...,X, be independent random variables having an 
unknown continuous distribution function F, and let Y,,Y>2,.... Yn be 
independent random variables having an unknown continuous 
distribution function G. Now order those n + m variables, and let 

1 ifthe ith smallest of then +m 

I,= variables is from the X sample 
0 otherwise 


n+m 
The random variable R = ». il; is the sum of the ranks of the X 
i=1 
sample and is the basis of a standard statistical procedure (called 
theWilcoxon sum-of-ranks test) for testing whether F and G are 
identical distributions. This test accepts the hypothesis that F = G 
when R is neither too large nor too small. Assuming that the 
hypothesis of equality is in fact correct, compute the mean and 
variance of R. 
Hint: Use the results of Example 3e 
7.46. Between two distinct methods for manufacturing certain goods, 
the quality of goods produced by method i is a continuous random 
variable having distribution F;,i = 1,2. Suppose that n goods are 
produced by method 1 and m by method 2. Rank the n + m goods 
according to quality, and let 
1 if the ith best was produced from 
X;= method 1 
2 otherwise 


For the vector X,,X>,...,Xn+m, which consists of n 1’s and m 2’s, let R 
denote the number of runs of 1. For instance, if n = 5,m = 2, and 
X = 1,2,1,1,1,1,2, then R = 2. lf F, = F,(that is, if the two methods 
produce identically distributed goods), what are the mean and 
variance of R? 
7.47. lf X1,X>,X3, and X, are (pairwise) uncorrelated random 
variables, each having mean 0 and variance 1, compute the 
correlations of 

A. X, + Xz and X, + X3; 

B.X, +X, and X3 + X4. 


7.48. Consider the following dice game, as played at a certain 


592 of 848 


593 of 848 


gambling casino: Players 1 and 2 roll a pair of dice in turn. The bank 
then rolls the dice to determine the outcome according to the 
following rule: Player i, i = 1, 2, wins if his roll is strictly greater than 
the bank’s. For i = 1, 2, let 
_ {1 ifi wins 
ae to otherwise 


and show that /, and J, are positively correlated. Explain why this 
result was to be expected. 
7.49 Consider a graph having n vertices labeled 1, 2, ...,n, and 


n 
suppose that, between each of the (") pairs of distinct vertices, an 


edge is independently present with probability p. The degree of 
vertex i, designated as D,, is the number of edges that have vertex i 
as one of their vertices. 

a. What is the distribution of D;? 

b. Find p(D;,D;), the correlation between D; and Dj. 


7.50. A fair die is successively rolled. Let X and Y denote, 
respectively, the number of rolls necessary to obtain a 6 anda 5. 
Find 


a. E|X]; 
b. E[X|Y = 1]; 
c. E[X|Y = 5]. 


7.51. There are two misshapen coins in a box; their probabilities for 
landing on heads when they are flipped are, respectively, .4 and .7. 
One of the coins is to be randomly chosen and flipped 10 times. 
Given that two of the first three flips landed on heads, what is the 
conditional expected number of heads in the 10 flips? 


7.52. The joint density of X and Y is given by 
e */Ve-y 
Ly es co,0<y< oo 


Compute E[X*|Y = y]. 
7.53. The joint density of X and Y is given by 
ev 
LN are) SWS YS 00 


Compute E[X*|Y = y]. 
7.54. A population is made up of r disjoint subgroups. Let p, denote 
the proportion of the population that is in subgroup i, i = 1,...,r. If the 
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average weight of the members of subgroup i is wj,i = 1, ...,r, what is 
the average weight of the members of the population? 
7.55. A prisoner is trapped in a cell containing 3 doors. The first door 
leads to a tunnel that returns him to his cell after 2 days’ travel. The 
second leads to a tunnel that returns him to his cell after 4 days’ 
travel. The third door leads to freedom after 1 day of travel. If it is 
assumed that the prisoner will always select doors 1, 2, and 3 with 
respective probabilities .5, .3, and .2, what is the expected number of 
days until the prisoner reaches freedom? 
7.56. Consider the following dice game: A pair of dice is rolled. If the 
sum is 7, then the game ends and you win 0. If the sum is not 7, then 
you have the option of either stopping the game and receiving an 
amount equal to that sum or starting over again. For each value of 
i,i = 2,...,12, find your expected return if you employ the strategy of 
stopping the first time that a value at least as large as i appears. 
What value of i leads to the largest expected return? 
Hint: Let X; denote the return when you use the critical value i. To 
compute E[X;|, condition on the initial sum. 
7.57. Ten hunters are waiting for ducks to fly by. When a flock of 
ducks flies overhead, the hunters fire at the same time, but each 
chooses his target at random, independently of the others. If each 
hunter independently hits his target with probability .6, compute the 
expected number of ducks that are hit. Assume that the number of 
ducks in a flock is a Poisson random variable with mean 6. 
7.58. The number of people who enter an elevator on the ground 
floor is a Poisson random variable with mean 10. If there are N floors 
above the ground floor, and if each person is equally likely to get off 
at any one of the N floors, independently of where the others get off, 
compute the expected number of stops that the elevator will make 
before discharging all of its passengers. 
7.59. Suppose that the expected number of accidents per week at an 
industrial plant is 5. Suppose also that the numbers of workers 
injured in each accident are independent random variables with a 
common mean of 2.5. If the number of workers injured in each 
accident is independent of the number of accidents that occur, 
compute the expected number of workers injured in a week. 
7.60. A coin having probability p of coming up heads is continually 
flipped until both heads and tails have appeared. Find 

a. the expected number of flips; 

b. the probability that the last flip lands on heads. 
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7.61. A coin that comes up heads with probability p is continually 
flipped. Let N be the number of flips until there have been both at 
least n heads and at least m tails. Derive an expression for E[N] by 
conditioning on the number of heads in the first n + m flips. 

7.62. There are n + 1 participants in a game. Each person 
independently is a winner with probability p. The winners share a 
total prize of 1 unit. (For instance, if 4 people win, then each of them 


1 
receives a? whereas if there are no winners, then none of the 


participants receives anything.) Let A denote a specified one of the 
players, and let X denote the amount that is received by A. 
a. Compute the expected total prize shared by the players. 
1— (1 _ p)"** 
ntl 
c. Compute E[X] by conditioning on whether A is a winner, and 


b. Argue that E[X] = 


conclude that 


n+1 


-1 T(1p) 
E{((1+B) ‘|= Gide 
when B is a binomial random variable with parameters n and p. 

7.63. Each of m + 2 players pays 1 unit to a kitty in order to play the 
following game: A fair coin is to be flipped successively n times, 
where n is an odd number, and the successive outcomes are noted. 
Before the n flips, each player writes down a prediction of the 
outcomes. For instance, if n = 3, then a player might write down 
(H,H,T), which means that he or she predicts that the first flip will 
land on heads, the second on heads, and the third on tails. After the 
coins are flipped, the players count their total number of correct 
predictions. Thus, if the actual outcomes are all heads, then the 
player who wrote (H,H,T), would have 2 correct predictions. The 
total kitty of m + 2 is then evenly split up among those players having 
the largest number of correct predictions. 

Since each of the coin flips is equally likely to land on either heads or 
tails, m of the players have decided to make their predictions in a 
totally random fashion. Specifically, they will each flip one of their 
own fair coins n times and then use the result as their prediction. 
However, the final 2 of the players have formed a syndicate and will 
use the following strategy: One of them will make predictions in the 
same random fashion as the other m players, but the other one will 
then predict exactly the opposite of the first. That is, when the 
randomizing member of the syndicate predicts an H, the other 
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member predicts a T. For instance, if the randomizing member of the 
syndicate predicts (H,H,T), then the other one predicts (T,T, H). 

a. Argue that exactly one of the syndicate members will have 
more than n/2 correct predictions. (Remember, n is odd.) 

b. Let X denote the number of the m nonsyndicate players who 
have more than n/2 correct predictions. What is the 
distribution of X? 

c. With X as defined in part (b), argue that 


E|pay off tothe syndicate] = (m+ 2) 
x | : | 
X+1 
d. Use part (c) of Problem 7.62 — to conclude that 
2(m + 2) 


E|payoff to the syndicate] = al 


t-@) | 


and explicitly compute this number when m = 1, 2, and 3. Because it 


can be shown that 
1 mt+1 
-(3) 


it follows that the syndicate’s strategy always gives it a positive 


2(m + 2) 


>2 
mt+1 


expected profit. 
7.64. The number of goals that J scores in soccer games that her 
team wins is Poisson distributed with mean 2, while the number she 
scores in games that her team loses is Poisson distributed with mean 
1. Assume that, independent of earlier results, J’s team wins each 
new game it plays with probability p. 

a. Find the expected number of goals that J scores in her team’s 

next game. 
b. Find the probability that J scores 6 goals in her next 4 games. 


Hint: Would it be useful to know how many of those games were 
won by J’s team. 
Suppose J’s team has just entered a tournament in which it will 
continue to play games until it loses. Let X denote the total number of 
goals scored by J in the tournament. Also, let VN be the number of 
games her team plays in the tournament. 

a. Find E[X]. 
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b. Find P(X = 0). 
c. Find P(N = 3|X = 5). 


7.65. If the level of infection of a tree is x, then each treatment will 
independently be successful with probability 1 — x. Consider a tree 
whose infection level is assumed to be the value of a uniform (0, 1) 
random variable. 
a. Find the probability that a single treatment will result in a cure. 
b. Find the probability that the first two treatments are 
unsuccessful. 
c. Find the probability it will take n treatments for the tree to be 
cured. 


7.66. Let X,, ... be independent random variables with the common 
distribution function F, and suppose they are independent of N, a 
geometric random variable with parameter p. Let M = max(X},..., Xv). 

a. Find P{M < x} by conditioning on N. 

b. Find P{M < x|N = 1}. 

c. Find P{M < x|N > 1}. 

d. Use (b) and (c) to rederive the probability you found in (a). 


7.67. Let U,, U2, ... be a sequence of independent uniform (0, 1) 
random variables. In Example 5i —_, we showed that for 
0 <x <1,E[N(x)| = e*, where 
n 
N(x) = minyn: ». U;>x 


i=1 


This problem gives another approach to establishing that result. 
a. Show by induction on n that for0 <x <1andalln=0, 
xn 


PIN(x) Sent I= 


Hint: First condition on U, and then use the induction hypothesis. 
Use part (a) to conclude that 
E[N(x)] = e* 


7.68. An urn contains 30 balls, of which 10 are red and 8 are blue. 
From this urn, 12 balls are randomly withdrawn. Let X denote the 
number of red and Y the number of blue balls that are withdrawn. 
Find Cov(X, Y) 
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a. by defining appropriate indicator (that is, Bernoulli) random 


variables 
10 


8 
Xi, Vj such that X = ». X,Y= ». Y; 
i 1 ji 


b. by conditioning (on either X or Y) to determine E[XY]. 


7.69. Type i light bulbs function for a random amount of time having 
mean u, and standard deviation o;,i = 1,2. A light bulb randomly 
chosen from a bin of bulbs is a type 1 bulb with probability p and a 
type 2 bulb with probability 1 — p. Let X denote the lifetime of this 
bulb. Find 

a. E[X]; 

b. Var(X). 


7.70. The number of winter storms in a good year is a Poisson 
random variable with mean 3, whereas the number in a bad year is a 
Poisson random variable with mean 5. If next year will be a good 
year with probability .4 or a bad year with probability .6, find the 
expected value and variance of the number of storms that will occur. 
7.71. In Example 5c __, compute the variance of the length of time 
until the miner reaches safety. 

7.72. Consider a gambler who, at each gamble, either wins or loses 
her bet with respective probabilities p and 1 — p. A popular gambling 
system known as the Kelley strategy is to always bet the fraction 


1 
2p — 1 of your current fortune when p > 5 Compute the expected 


fortune after n gambles of a gambler who starts with x units and 
employs the Kelley strategy. 

7.73. The number of accidents that a person has in a given year is a 
Poisson random variable with mean A. However, suppose that the 
value of A changes from person to person, being equal to 2 for 60 
percent of the population and 3 for the other 40 percent. If a person 
is chosen at random, what is the probability that he will have (a) 0 
accidents and (b) exactly 3 accidents in a certain year? What is the 
conditional probability that he will have 3 accidents in a given year, 
given that he had no accidents the preceding year? 

7.74. Repeat Problem 7.73 — when the proportion of the population 
having a value of A less than x is equal to 1 — e *. 

7.75 Consider an urn containing a large number of coins, and 
suppose that each of the coins has some probability p of turning up 
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heads when it is flipped. However, this value of p varies from coin to 
coin. Suppose that the composition of the urn is such that if a coin is 
selected at random from it, then the p-value of the coin can be 
regarded as being the value of a random variable that is uniformly 
distributed over [0, 1]. If a coin is selected at random from the urn 
and flipped twice, compute the probability that 

a. the first flip results in a head; 

b. both flips result in heads. 


7.76. In Problem 7.75 _ , suppose that the coin is tossed n times. 
Let X denote the number of heads that occur. Show that 


1 
PIX =i} = i=0,1,..,n 


Hint: Make use of the fact that 


a-1 b-1 = (a—1)!(b—1)! 
| x*~*(1 — x) cae es |e 
0 


when a and b are positive integers. 
7.77. Suppose that in Problem 7.75 __, we continue to flip the coin 
until a head appears. Let N denote the number of flips needed. Find 
a. P{N > i},i= 1; 
b. P{N = i}; 
c. E[N]. 


7.78. In Example 6b __, let S denote the signal sent and R the signal 
received. 

a. Compute E[R]. 

b. Compute Var(R). 

c. Is R normally distributed? 

d. Compute Cov(R, S). 


7.79. In Example 6c __, suppose that X is uniformly distributed over 
1 
(0, 1). If the discretized regions are determined by ay = 0,a, = 3 


and a, = 1, calculate the optimal quantizer Y and compute 
E[(X -Y)’]. 
7.80 The moment generating function of X is given by 


3 1 10 
My(t) = exp{2e‘ — 2} and that of Y by My(t) = (; ef + i) . If X and 


Y are independent, what are 
a. P{X +Y = 2}? 


b. P{XY = 0}? 
c. E[XY]? 


7.81 . Let X be the value of the first die and Y the sum of the values 
when two dice are rolled. Compute the joint moment generating 
function of X and Y. 

7.82 . The joint density of X and Y is given by 


1 
fay) = Jo ae 0<y<o, 


—o <x <0 


a. Compute the joint moment generating function of X and Y. 
b. Compute the individual moment generating functions. 


7.83. Two envelopes, each containing a check, are placed in front of 
you. You are to choose one of the envelopes, open it, and see the 
amount of the check. At this point, either you can accept that amount 
or you can exchange it for the check in the unopened envelope. 
What should you do? Is it possible to devise a strategy that does 
better than just accepting the first envelope? 

Let A and B,A < B, denote the (unknown) amounts of the checks, 
and note that the strategy that randomly selects an envelope and 
always accepts its check has an expected return of (A + B)/2. 
Consider the following strategy: Let F(- ) be any strictly increasing 
(that is, continuous) distribution function. Choose an envelope 
randomly and open it. If the discovered check has the value x, then 
accept it with probability F(x) and exchange it with probability 

1 — F(x). 

a. Show that if you employ the latter strategy, then your expected 
return is greater than (A + B)/2 

b. Hint: Condition on whether the first envelope has the value A 
or B. 

c. Now consider the strategy that fixes a value x and then 
accepts the first check if its value is greater than x and 
exchanges it otherwise. 

a. Show that for any x, the expected return under the x 
-strategy is always at least (A + B)/2 and that it is 
strictly larger than (A + B) /2 if x lies between A and B. 

b. Let X be a continuous random variable on the whole 
line, and consider the following strategy: Generate the 
value of X, and if X = x, then employ the x-strategy of 
part (b). Show that the expected return under this 
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strategy is greater than (A + B) /2. 


7.84. Successive weekly sales, in units of $1, 000, have a bivariate 
normal distribution with common mean 40, common standard 
deviation 6, and correlation .6. 

a. Find the probability that the total of the next 2 weeks’ sales 
exceeds 90. 

b. If the correlation were .2 rather than .6, do you think that this 
would increase or decrease the answer to (a)? Explain your 
reasoning. 

c. Repeat (a) when the correlation is .2. 


Theoretical exercises 


7.1. Show that E[(X — a)’] is minimized at a = E[X]. 

7.2. Suppose that X is a continuous random variable with density function f. 
Show that E[|X — a|] is minimized when a is equal to the median of F. 
Hint: Write 


E[|X — al] = [lo aires dx 


Now break up the integral into the regions where x < a and where x > a, and 
differentiate. 
7.3 Prove Proposition 2.1 when 
a. X and Y have a joint probability mass function; 
b. X and Y have a joint probability density function and g(x, y) = 0 for all 
X,Y. 


7.4 Let X be a random variable having finite expectation w and variance o?, 
and let g(- ) be a twice differentiable function. Show that 


Elg(X)] = gw) + gw) 


2 
2 oO 


Hint: Expand g(- ) in a Taylor series about yu. Use the first three terms and 
ignore the remainder. 
7.5 If X => 0 and g is a differentiable function such that g(0) = 0, show that 


E[g(X)] = | P(X > t)g'(t) dt 


Hint: Define random variables /(t),t => 0 so that 
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v0 = | yioae= | I(t)g' (t) dt 


7.6. Let A,, Ap,...,4, be arbitrary events, and define 
Cx = {atleast k of the A; occur}. Show that 


y P(Cy) = y P(Ax) 
k=1 k=1 


Hint: Let X denote the number of the A; that occur. Show that both sides of 
the preceding equation are equal to E[X]. 
7.7. In the text, we noted that 


i=1 i=1 


when the X; are all nonnegative random variables. Since an integral is a limit 
of sums, one might expect that 


| | x(@at| = | E[X(t)] dt 
0 0 


whenever X(t),0 < t < o, are all nonnegative random variables; this result is 
indeed true. Use it to give another proof of the result that for a nonnegative 
random variable xX, 


E[X) -| P{X > t}dt 


Hint: Define, for each nonnegative t, the random variable X(t) by 


1 ift<X 
X(t) = 
0 ift>X 


Now relate J, X(t)dt to X. 


7.8. We say that X is stochastically larger than Y, written X =,, Y, if, for all t, 
P{X >t}>P{Y >t} 


Show that if X >,, Y, then E[X] = E[Y] when 
a. X and Y are nonnegative random variables; 
b. X and Y are arbitrary random variables. 


Hint: Write X as 
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XaxX" =X 


where 


»_(X ifX>0 7 ( 0 ifx>0 
lo ifx<0 —X ifx>0 


Similarly, represent Y as Y* — Y_. Then make use of part (a). 
7.9. Show that X is stochastically larger than Y if and only if 
Elf(X)] = ELF(Y)] 


for all increasing functions f. 

Hint: Show that X >,, Y, then E[f(X)| = E[f(Y)] by showing that 

f(X) =s f(Y) and then using Theoretical Exercise 7.8. To show that if 
E|f(X)] = E[f(Y)] for all increasing functions f, then P{X > t} > P{Y > t}, 
define an appropriate increasing function f. 

7.10. A coin having probability p of landing on heads is flipped n times. 
Compute the expected number of runs of heads of size 1, of size 2, and of 
sizek,l<k<n. 

7.11. Let X,, Xo, ...,X,, be independent and identically distributed positive 
random variables. For k < n, find 


E 


k 

2, % 
=" 

n 

2, % 
(=i 
7.12. Consider n independent trials, each resulting in any one of r possible 
outcomes with probabilities P,, Po, ...,P,. Let X denote the number of outcomes 
that never occur in any of the trials. Find E[X] and show that among all 
probability vectors P,,..., P,,E|X]| is minimized when P; = 1/r,i = 1,...,r. 
7.13. Let X,,X2,... be a sequence of independent random variables having the 


probability mass function 
P{X, = 0} = P{X, =2}=1/2, n2>1 


The random variable X = yale is said to have the Cantor distribution. 
Find E[X] and Var(X). 

7.14. Let Xj, ...,X, be independent and identically distributed continuous 
random variables. We say that a record value occurs at time j, j < n, if X; = X; 
for all 1 <i < j. Show that 


n 
a. E[number of record values] = > 1/j; 
j= 
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n 
b. Var(number of record values) = > (Gj -1)/j?. 
j=l 


7.15 For Example 2i __, show that the variance of the number of coupons 
needed to amass a full set is equal to 
N—1 


When N is large, this can be shown to be approximately equal (in the sense 
that their ratio approaches 1 as N > © ) to N’n*/6. 
7.16 Consider n independent trials, the ith of which results in a success with 
probability P; 
a. Compute the expected number of successes in the n trials—call it yw. 
b. For a fixed value of yu, what choice of P,, ..., P,, maximizes the variance 
of the number of successes? 
c. What choice minimizes the variance? 


* 7.17. Suppose that each of the elements of 5 = {1, 2, ...,n} is to be colored 
either red or blue. Show that if A;,..., A, are subsets of S, there is a way of 


e 
doing the coloring so that at most ». (1/2) !4il-4 of these subsets have all 
-=4 

their elements the same color (where |A| denotes the number of elements in 
the set A). 
7.18. Suppose that X, and X, are independent random variables having a 
common mean u. Suppose also that Var(X,) = o? and Var(X,) = of The value 
of uw is unknown, and it is proposed that u be estimated by a weighted average 
of X, and X,. That is, AX, + (1 — A)X, will be used as an estimate of for 
some appropriate value of 2. Which value of A yields the estimate having the 
lowest possible variance? Explain why it is desirable to use this value of A. 
7.19. In Example 4f —, we showed that the covariance of the multinomial 
random variables N; and N; is equal to —-mP;P; by expressing N; and N; as 
the sum of indicator variables. We could also have obtained that result by 
using the formula 

Var(N; + N;) = Var(N;) + Var(N;) + 2Cov(Ni,N;) 


a. What is the distribution of N; + N;? 
b. Use the preceding identity to show that Cov(N;,N;) = — mP;P;. 


7.20. Show that X and Y are identically distributed and not necessarily 
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independent, then 
Cov(X+Y,X —Y)=0 


7.21. The Conditional Covariance Formula. The conditional covariance of X 
and Y, given Z, is defined by 
Cov(X,Y |Z) = E[(X — E[X|Z])(V —- E[Y|Z]) |Z] 


a. Show that 
Cov(X, Y|Z) = E[XY|Z] — E[X|Z]E[Y|Z] 


b. Prove the conditional covariance formula 
Cov(X,Y) = E[Cov(X,Y|Z)] 
+ Cov(E[X|Z], E[Y|Z]) 


c. Set X = Y in part (b) and obtain the conditional variance formula. 


7.22. Let X(),i = 1,...,n, denote the order statistics from a set of n uniform (0, 


1) random variables, and note that the density function of Xi) is given by 
= ett ey toe xe 
[= Ganiaeoi Z 
a. Compute var(X (i), = 1,..,n. 


b. Which value of i minimizes, and which value maximizes, Var(X())? 


7.23. Show that Y = a+ bX, then 


+1 ifb>0 
P(X%Y) = 


—1 ifb<0 


7.24. Show that Z is a standard normal random variable and if Y is defined by 
Y=a+bZ+cZ’, then 


b 
pV, 2) =§ —— 
V b* + 2c2 


7.25. Prove the Cauchy—Schwarz inequality, namely, 
(E[xY])° < E[x?]z[¥?] 


Hint: Unless Y = — tX for some constant, in which case the inequality holds 
with equality, it follows that for all t, 
0 <E|(tx +) ] = E[x? |e? + 26[xv]e + E[Y?| 


Hence, the roots of the quadratic equation 
E[X?|t? + 2E[xY|t + E[Y?| =0 


must be imaginary, which implies that the discriminant of this quadratic 
equation must be negative. 
7.26. Show that if X and Y are independent, then 

E[X|Y = y] = E[X] forall y 


a. in the discrete case; 
b. in the continuous case. 


7.27. Prove that E[g(X)Y |X] = g(X)E[Y |X]. 
7.28. Prove that if E[Y |X = x] = E[Y] for all x, then X and Y are uncorrelated; 
give a counterexample to show that the converse is not true. 
Hint: Prove and use the fact that E[XY] = E[XE[Y|X]]. 
7.29. Show that Cov(X, E[Y|X]) = Cov(X,Y). 
7.30. Let Xj, ...,X, be independent and identically distributed random 
variables. Find 
E[X, |X. +--+ +X, =x] 


7.31. Consider Example 4f —, which is concerned with the multinomial 
distribution. Use conditional expectation to compute E[N;N ;|, and then use 
this to verify the formula for Cov(N;, N;) given in Example 4f 

7.32. An urn initially contains b black and w white balls. At each stage, we add 
r black balls and then withdraw, at random, r balls from the b + w +r balls in 


the urn. Show that 
E|number of white balls after stage t] 


(ote) 
= | —_———_ ] w 
b+w+r 


7.33. For an event A, let J, equal 1 if A occurs and let it equal 0 if A does not 
occur. For a random variable X, show that 
E[XI,] 


EIX|Al = Soy 


7.34. A coin that lands on heads with probability p is continually flipped. 
Compute the expected number of flips that are made until a string of r heads 
in a row is obtained. 

Hint: Condition on the time of the first occurrence of tails to obtain the 
equation 
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EX] = (1-p) > po tG+ ELD) 


+(1—p) ». pe 
i=rti 


Simplify and solve for E[X]. 
7.35. For another approach to Theoretical Exercise 7.34 __, let T,, denote 
the number of flips required to obtain a run of r consecutive heads. 

a. Determine E[T,.|T,—1]. 

b. Determine E[T,.] in terms of E[T,_,]. 

c. What is E[T, |? 

d. What is E[T,.|? 


7.36. The probability generating function of the discrete nonnegative integer 
valued random variable X having probability mass function pj j 2 9, is defined 


by 


o(s) = Els*]= > psi 
j=0 


Let Y be a geometric random variable with parameter p = 1 — s, where 
0 <s < 1. Suppose that Y is independent of X, and show that 
p(s) = P{X <¥} 


7.37. One ball at a time is randomly selected from an urn containing a white 
and b black balls until all of the remaining balls are of the same color. Let Ma, 
denote the expected number of balls left in the urn when the experiment ends. 
Compute a recursive formula for M,,, and solve when a = 3 and b=5. 
7.38. An urn contains a white and b black balls. After a ball is drawn, it is 
returned to the urn if it is white; but if it is black, it is replaced by a white ball 
from another urn. Let M,, denote the expected number of white balls in the urn 
after the foregoing operation has been repeated n times. 

a. Derive the recursive equation 


1 
Mni4 = (2 _ 5)Mn + 1 


b. Use part (a) to prove that 


1 n 
M,=a+b-0(1-—] 
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c. What is the probability that the (n + 1) ball drawn is white? 


7.39. The best linear predictor of Y with respect to X, and X, is equal to 
a+ bX, + cX2, where a,b, and c are chosen to minimize 
E[(Y — (a+ bX, +cX,))*] 


Determine a, b, and c. 

7.40. The best quadratic predictor of Y with respect to X is a+ bX + cX?, 
where a, b, and c are chosen to minimize E|(Y — (a+ bX + cX?))’]. Determine 
a, b, and c. 

7.41. Use the conditional variance formula to determine the variance of a 
geometric random variable X having parameter p. 

7.42. Let X be a normal random variable with parameters py = 0 and a2 = 1, 


1 
and let J, independent of X, be such that P{J = 1} = ie P{I = 0}. Now define 
Y by 


xX iff=1 
Y= 
—X ifl=0 


In words, Y is equally likely to equal either X or —X. 
a. Are X and Y independent? 
b. Are J and Y independent? 
c. Show that Y is normal with mean 0 and variance 1. 
d. Show that Cov(X,Y) = 0. 


7.43. It follows from Proposition 6.1 —_ and the fact that the best linear 
oO 
predictor of Y with respect to X is My +p = — y,,) that if 
x 
E[Y|X] =a+ bx 


then 


(Why?) Verify this directly. 
7.44. Show that for random variables X and Z, 
E|(X - Y)"] = E[x?] - E[Y?] 


where 
Y =E[X|Z] 


7.45. Consider a population consisting of individuals able to produce offspring 
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of the same kind. Suppose that by the end of its lifetime, each individual will 
have produced j new offspring with probability P;, j = 0, independently of the 
number produced by any other individual. The number of individuals initially 
present, denoted by Xo, is called the size of the zeroth generation. All 
offspring of the zeroth generation constitute the first generation, and their 
number is denoted by X,. In general, let X,, denote the size of the nth 


generation. Let pw = » jP; and 0? = ». (P= y)?P; denote, respectively, 
j=o j=o 
the mean and the variance of the number of offspring produced by a single 
individual. Suppose that X 9 = 1 that is, initially there is a single individual in 
the population. 
a. Show that 
E|X;, = HE|Xy—1] 


b. Use part (a) to conclude that 


c. Show that 
Var(X,) = 07" 1 4+ p? Var(Xn-1) 


d. Use part (c) to conclude that 


Var(X,) = 


e. The model just described is known as a branching process, and an 
important question for a population that evolves along such lines is the 
probability that the population will eventually die out. Let 2 denote this 
probability when the population starts with a single individual. That is, 

m = P{population eventually dies out|Xy = 1) 


f. Argue that z satisfies 


(ee) 
t= 3 Pie 
j =0 


Hint: Condition on the number of offspring of the initial member of the 
population. 

7.46. Verify the formula for the moment generating function of a uniform 
random variable that is given in Table 7.2 _. Also, differentiate to verify the 
formulas for the mean and variance. 


7.47. For a standard normal random variable Z, let u,, = E|Z”|. Show that 
0 when n is odd 

B, =4 (27)! 

27j! 


when n = 2) 


Hint: Start by expanding the moment generating function of Z into a Taylor 
series about 0 to obtain 
Ele] = et /2 


_ be a 


7.48. Let X be a normal random variable with mean yu and variance o?. Use 


the results of Theoretical Exercise 7.47 to show that 


EB ndal(? ” n~2ig2i(2;) 


j=1 


In the preceding equation, [n/2] is the largest integer less than or equal to 
n/2. Check your answer by letting n = 1 and n = 2. 
7.49. If Y = aX + b, where a and b are constants, express the moment 
generating function of Y in terms of the moment generating function of X. 
7.50. The positive random variable X is said to be a lognormal random 
variable with parameters u and o? if log(X) is a normal random variable with 
mean yu and variance a”. Use the normal moment generating function to find 
the mean and variance of a lognormal random variable. 
7.51. Let X have moment generating function M(t), and define Y(t) = logM(t). 
Show that 

P"(t)|,-) = Var(X) 


7.52. Use Table 7.2 to determine the distribution of ». X; when X,,..., Xn 
(=4 

are independent and identically distributed exponential random variables, 
each having mean 1/A. 
7.53. Show how to compute Cov(X, Y) from the joint moment generating 
function of X and Y. 
7.54. Suppose that X,, ....X,, have a multivariate normal distribution. Show that 
X1,-Xp, are independent random variables if and only if 

Cov(X;,X;) =0 wheni ¥#j 
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7.55. If Z is a standard normal random variable, what is Cov(Z, Z*)? 
7.56. Suppose that Y is a normal random variable with mean pu and variance 
a”, and suppose also that the conditional distribution of X, given that Y = y, is 
normal with mean y and variance 1. 
a. Argue that the joint distribution of X,Y is the same as that of Y + Z,Y 
when Z is a standard normal random variable that is independent of Y. 
b. Use the result of part (a) to argue that X, Y has a bivariate normal 
distribution. 
c. Find E[X], Var(X), and Corr (X,Y). 
d. Find E[Y|X = x]. 
e. What is the conditional distribution of Y given that X = x? 


Self-Test Problems and Exercises 


7.1. Consider a list of m names, where the same name may appear more than 
once on the list. Let n(i), i = 1,...,m, denote the number of times that the 
name in position i appears on the list, and let d denote the number of distinct 
names on the list. 

a. Express d in terms of the variables m, n(i),i = 1,...,.m. Let U be a 

uniform (0, 1) random variable, and let X = [mU] + 1. 
b. What is the probability mass function of X? 
c. Argue that E[m/n(X)] = d. 


7.2 An urn has n white and m black balls that are removed one at a time ina 
randomly chosen order. Find the expected number of instances in which a 
white ball is immediately followed by a black one. 
7.3 Twenty individuals consisting of 10 married couples are to be seated at 5 
different tables, with 4 people at each table. 
a. If the seating is done “at random,” what is the expected number of 
married couples that are seated at the same table? 
b. If 2 men and 2 women are randomly chosen to be seated at each table, 
what is the expected number of married couples that are seated at the 
same table? 


7.4 lf adie is to be rolled until all sides have appeared at least once, find the 
expected number of times that outcome 1 appears. 

7.5. A deck of 2n cards consists of n red and n black cards. The cards are 
shuffled and then turned over one at a time. Suppose that each time a red 
card is turned over, we win 1 unit if more red cards than black cards have 
been turned over by that time. (For instance, if m = 2 and the result is r br b, 
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then we would win a total of 2 units.) Find the expected amount that we win. 
7.6. Let A,, Ap, ...,4, be events, and let N denote the number of them that 
occur. Also, let J = 1 if all of these events occur, and let it be 0 otherwise. 
Prove Bonferroni’s inequality, namely, 


P(ArAn) =). P(A) —(@=1) 
i=1 


Hint: Argue first that VN <n—1+1. 

7.7. Let X be the smallest value obtained when k numbers are randomly 
chosen from the set 1, ...,n. Find E[X] by interpreting X as a negative 
hypergeometric random variable. 

7.8. An arriving plane carries r families. A total of n; of these families have 


checked in a total of j pieces of luggage, > = r. Suppose that when the 
j 
plane lands, the N = is pieces of luggage come out of the plane in a 
j 
random order. As soon as a family collects all of its luggage, it immediately 


departs the airport. If the Sanchez family checked in j pieces of luggage, find 
the expected number of families that depart after they do. 

* 7.9. Nineteen items on the rim of a circle of radius 1 are to be chosen. Show 
that for any choice of these points, there will be an arc of (arc) length 1 that 
contains at least 4 of them. 

7.10. Let X be a Poisson random variable with mean 4. Show that if 2 is not 
too small, then 


Var(vX) = .25 


Hint: Use the result of Theoretical Exercise 7.4 to approximate E|vX]. 
7.11. Suppose in Self-Test Problem 7.3 that the 20 people are to be 
seated at seven tables, three of which have 4 seats and four of which have 2 
seats. If the people are randomly seated, find the expected value of the 
number of married couples that are seated at the same table. 
7.12. Individuals 1 through n,n > 1, are to be recruited into a firm in the 
following manner: Individual 1 starts the firm and recruits individual 2. 
Individuals 1 and 2 will then compete to recruit individual 3. Once individual 3 
is recruited, individuals 1, 2, and 3 will compete to recruit individual 4, and so 
on. Suppose that when individuals 1, 2, ...,i compete to recruit individual i+ 1, 
each of them is equally likely to be the successful recruiter. 
a. Find the expected number of the individuals 1, ...,n who did not recruit 
anyone else. 
b. Derive an expression for the variance of the number of individuals who 
did not recruit anyone else, and evaluate it for n = 5. 
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7.13 The nine players on a basketball team consist of 2 centers, 3 forwards, 
and 4 backcourt players. If the players are paired up at random into three 
groups of size 3 each, find (a) the expected value and (b) the variance of the 
number of triplets consisting of one of each type of player. 
7.14 A deck of 52 cards is shuffled and a bridge hand of 13 cards is dealt out. 
Let X and Y denote, respectively, the number of aces and the number of 
spades in the hand. 

a. Show that X and Y are uncorrelated. 

b. Are they independent? 


7.15 Each coin in a bin has a value attached to it. Each time that a coin with 
value p is flipped, it lands on heads with probability p. When a coin is 
randomly chosen from the bin, its value is uniformly distributed on (0, 1). 
Suppose that after the coin is chosen but before it is flipped, you must predict 
whether it will land on heads or on tails. You will win 1 if you are correct and 
will lose 1 otherwise. 

a. What is your expected gain if you are not told the value of the coin? 

b. Suppose now that you are allowed to inspect the coin before it is 
flipped, with the result of your inspection being that you learn the value 
of the coin. As a function of p, the value of the coin, what prediction 
should you make? 

c. Under the conditions of part (b), what is your expected gain? 


7.16. In Self-Test Problem 7.1, we showed how to use the value of a 
uniform (0, 1) random variable (commonly called a random number) to obtain 
the value of a random variable whose mean is equal to the expected number 
of distinct names on a list. However, its use required that one choose a 
random position and then determine the number of times that the name in that 
position appears on the list. Another approach, which can be more efficient 
when there is a large amount of replication of names, is as follows: As before, 
start by choosing the random variable X asin Problem 7.1 _ . Now identify 
the name in position X, and then go through the list, starting at the beginning, 
until that name appears. Let J equal 0 if you encounter that name before 
getting to position X, and let J equal 1 if your first encounter with the name is 
at position X. Show that E[mlI] = d. 

Hint: Compute E[/] by using conditional expectation. 

7.17. A total of m items are to be sequentially distributed among n cells, with 
each item independently being put in cell j with probability Ppt = 1s, Find 
the expected number of collisions that occur, where a collision occurs 
whenever an item is put into a nonempty cell. 

7.18. Let X be the length of the initial run in a random ordering of n ones and 
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m zeros. That is, if the first k values are the same (either all ones or all zeros), 
then X > k. Find E[X]. 

7.19. There are n items in a box labeled H and m in a box labeled T. A coin 
that comes up heads with probability p and tails with probability 1 — p is 
flipped. Each time it comes up heads, an item is removed from the H box, and 
each time it comes up tails, an item is removed from the T box. (If a box is 
empty and its outcome occurs, then no items are removed.) Find the expected 
number of coin flips needed for both boxes to become empty. 

Hint: Condition on the number of heads in the first n + m flips. 

7.20. Let X be a nonnegative random variable having distribution function F. 
Show that if F(x) = 1 — F(x), then 


E[X"] -| x" 1F(x) dx 
0 


Hint: Start with the identity 


x 
> ie »| x dx 
0 


»| eT (Ryde 
0 


1,(x) 1, ifx<X 
x)= 
: 0, otherwise 


where 


* 7.21. Let a4, ...,a,, not all equal to 0, be such that ae a; = 0. Show that 
there is a permutation i,, ...,i,, such that Died qi;Gij;,, < 0. 
Hint: Use the probabilistic method. (It is interesting that there need not be a 
permutation whose sum of products of successive pairs is positive. For 
instance, ifn = 3,a, = a, = — 1, and az = 2, there is no such permutation.) 
7.22. Suppose that X;, i = 1, 2,3, are independent Poisson random variables 
with respective means /;, i = 1,2,3. Let X¥ = X, + X, and Y = X,+ X3. The 
random vector X,Y is said to have a bivariate Poisson distribution. 

a. Find E[X] and E[Y]. 

b. Find Cov(X,Y). 

c. Find the joint probability mass function P{X = i, Y = j}. 


7.23. Let (X;,Y;), i = 1,..., be a sequence of independent and identically 
distributed random vectors. That is, X,,Y, is independent of, and has the 
same distribution as, X,Y, and so on. Although X; and Y; can be dependent, 
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X; and Y; are independent when i # j. Let 
uw, = E[X;], w, = E[Yi], of = Var(Xi), 


o% = Var(Y;), p = Corr (X;,Y;) 


Find Corr (Y= , Xi x; a ¥;): 
7.24. Three cards are randomly chosen without replacement from an ordinary 
deck of 52 cards. Let X denote the number of aces chosen. 

a. Find E[X|the ace of spades is chosen]. 


b. Find E[X|at least one ace is chosen]. 


7.25. Let ® be the standard normal distribution function, and let X be a normal 
random variable with mean wu and variance 1. We want to find E[®(X)]|. To do 
so, let Z be a standard normal random variable that is independent of X, and 
let 
i Fae | ey A 6 
~ i if Z>X 


a. Show that E[/|X = x] = ®(x). 
b. Show that E[®(X)] = P{Z < X}. 
Lu 


c. Show that E[@(X)]| = o(<). 


Hint: What is the distribution of X — Z? 

The preceding comes up in statistics. Suppose you are about to observe the 
value of a random variable X that is normally distributed with an unknown 
mean yu and variance 1, and suppose that you want to test the hypothesis that 
the mean yz is greater than or equal to 0. Clearly you would want to reject this 
hypothesis if X is sufficiently small. If it results that X¥ = x, then the p-value of 
the hypothesis that the mean is greater than or equal to 0 is defined to be the 
probability that X would be as small as x if ~ were equal to 0 (its smallest 
possible value if the hypothesis were true). (A small p-value is taken as an 
indication that the hypothesis is probably false.) Because X has a standard 
normal distribution when yu = 0, the p-value that results when X = x is @(x). 
Therefore, the preceding shows that the expected p-value that results when 


: : U 
the true mean is pis o(=), 
re hale 


7.26. A coin that comes up heads with probability p is flipped until either a 
total of n heads or of m tails is amassed. Find the expected number of flips. 
Hint: Imagine that one continues to flip even after the goal is attained. Let X 
denote the number of flips needed to obtain n heads, and let Y denote the 
number of flips needed to obtain m tails. Note that 
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max(X, Y) + min(X,Y) = X + Y.Compute E[max(X, Y)] by conditioning on the 
number of heads in the first n + m — 1 flips. 

7.27. A deck of n cards numbered 7 through 2, initially in any arbitrary order, is 
shuffled in the following manner: At each stage, we randomly choose one of 
the cards and move it to the front of the deck, leaving the relative positions of 
the other cards unchanged. This procedure is continued until all but one of the 
cards has been chosen. At this point, it follows by symmetry that all n! 
possible orderings are equally likely. Find the expected number of stages that 
are required. 

7.28. Suppose that a sequence of independent trials in which each trial is a 
success with probability p is performed until either a success occurs or a total 
of n trials has been reached. Find the mean number of trials that are 
performed. 

Hint: The computations are simplified if you use the identity that for a 
nonnegative integer valued random variable xX, 


E[X] = \ P(X > 


7.29. Suppose that X and Y are both Bernoulli random variables. Show that X 
and Y are independent if and only if Cov(x,Y) = 0. 

7.30. In the generalized match problem, there are n individuals of whom n; 
wear hat size i, y — ,n =n. There are also n hats, of which h, are of size 

L ee h; = n.\|f each individual randomly chooses a hat (without 
replacement), find the expected number who choose a hat that is their size. 
7.31. For random variables X and Y, show that 


PEO = (ato Va) 


That is, show that the standard deviation of a sum is always less than or equal 
to the sum of the standard deviations. 

7.32. Let Ry, ....Ryn+m be a random permutation of 1, ...,n + m. (That is, 

R,, «Rn+m iS equally likely to be any of the (n + m)! permutations of 


1, .... 2 + m.) For a given i < n, let X be the the i” smallest of the values 
l 
.. Ry. Show that =i —. 
Rj, ... Rn. Show that E[X] CMa 
Hint: Note that if we let /,,,,, equal 1 if R,,4;,< X and let it equal 0 otherwise, 


that 
m 
X=it+ >. Intk 


k=1 


7.33. Suppose that Y is uniformly distributed over (0,1), and that the 
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conditional distribution of X, given that Y = y, is uniform over (0, y). 
a. Find E[X]. 
b. Find Cov(X,Y). 
c. Find Var(X). 
d. Find P{X < x}. 
e. Find the probability density function of X. 
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8.1 Introduction 


The most important theoretical results in probability theory are limit theorems. Of 
these, the most important are those classified either under the heading /aws of large 
numbers or under the heading central limit theorems. Usually, theorems are 
considered to be laws of large numbers if they are concerned with stating conditions 
under which the average of a sequence of random variables converges (in some 
sense) to the expected average. By contrast, central limit theorems are concerned 
with determining conditions under which the sum of a large number of random 
variables has a probability distribution that is approximately normal. 


8.2 Chebyshev’s Inequality and the Weak 


Law of Large Numbers 


We start this section by proving a result known as Markov’s inequality. 


Proposition 2.1 Markov’s inequality 


If X is a random variable that takes only nonnegative values, then for any value 


a> 0, 
E|X 
P{xX>a}< alc 
a 
Proof For a > 0, let 
_ fl ifxX>a 
~ (0 otherwise 


and note that, since X > 0, 


Ql 


Taking expectations of the preceding inequality yields 


E|x] 


which, because E[/] = P{X = a}, proves the result. 
As a corollary, we obtain Proposition 2.2. 


Proposition 2.2 Chebyshev’s inequality 


If X is a random variable with finite mean py and variance o2, then for any value 
k>0, 


2 
oO 
PiUX-wl2k} soy 


Proof Since (X — 1)? is a nonnegative random variable, we can apply Markov’s 
inequality (with a = k”) to obtain 


(2.1) 
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But since (X — y)* > k? if and only if |X — w| > k, Equation (2.1) is equivalent 
to 


E[(X—y)*]__ 0? 
ay aa) 


P{\xX-pwl>h< = 
{|X —p| = k} 2 7 


and the proof is complete. 


The importance of Markov’s and Chebyshev’s inequalities is that they enable us to 
derive bounds on probabilities when only the mean or both the mean and the 
variance of the probability distribution are known. Of course, if the actual distribution 
were known, then the desired probabilities could be computed exactly and we would 
not need to resort to bounds. 


Example 2a 


Suppose that it is known that the number of items produced in a factory during a 
week is a random variable with mean 50. 


a. What can be said about the probability that this week’s production will 
exceed 75? 

b. If the variance of a week’s production is known to equal 25, then what can 
be said about the probability that this week’s production will be between 
40 and 60? 


Solution 


Let X be the number of items that will be produced in a week. 


a. By Markov’s inequality, 
E[X| 50 2 
i 
P{xX > 75} < 75 a 


b. By Chebyshev’s inequality, 
P{|X —50| => 10} < er 
~ = go" 4 


Hence, 


P{|X —50| <10}>1 ie” 
- 4 4 


so the probability that this week’s production will be between 40 and 60 is at least 
AD. 


As Chebyshev’s inequality is valid for all distributions of the random variable X, we 
cannot expect the bound on the probability to be very close to the actual probability 
in most cases. For instance, consider Example 2b 


Example 2b 


If X is uniformly distributed over the interval (0, 10), then, since E[X] = 5 and 
25 
Var (X) = 3° it follows from Chebyshev’s inequality that 


P{|X-—5|>4}< = 52 
~ 3(16) 


whereas the exact result is 


P{|X—5| > 4} =.20 


Thus, although Chebyshev’s inequality is correct, the upper bound that it 
provides is not particularly close to the actual probability. 


Similarly, if X is a normal random variable with mean uw and variance o?, 
Chebyshev’s inequality states that 


1 
P{|X —p| > 20}< m1 
whereas the actual probability is given by 


P{|X — | > 20} = a -—4 > 2} = 2[1 — &(2)] ~ .0456 


Chebyshev’s inequality is often used as a theoretical tool in proving results. This use 
is illustrated first by Proposition 2.3. and then, most importantly, by the weak law 
of large numbers. 


Proposition 2.3 


If Var(X) = 0, then 


P{X = E[X]} =1 


In other words, the only random variables having variances equal to 0 are those 
that are constant with probability 1. 


Proof By Chebyshev’s inequality, we have, for any n => 1, 
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P3|X ise =0 
ul > or = 


Letting n — oo and using the continuity property of probability yields 


; 1 
Pt, lim | {le —pu|> ‘|| 


P{X # wu} 


. 1 
O= lim, Pt x= > >} 


and the result is established. 


Theorem 2.1 The weak law of large numbers 


Let X,,X2,... be a sequence of independent and identically distributed random 
variables, each having finite mean E|X;] = yu. Then, for any e > 0, 


Pl 


Xyte-t+Xy 
n 


—p|> 2} > 0asn— 0 


Proof We shall prove the theorem only under the additional assumption that the 


random variables have a finite variance a”. Now, since 


Xyt +X 
|= n 
n 


Xp te +Xy a 
=p and Var| ————— 


it follows from Chebyshev’s inequality that 


Xp te +Xq o? 
P — 5 ZE = a 


and the result is proven. 


The weak law of large numbers was originally proven by James Bernoulli for the 
special case where the X; are 0, 1 (that is, Bernoulli) random variables. His 
statement and proof of this theorem were presented in his book Ars Conjectandi, 
which was published in 1713, eight years after his death, by his nephew Nicholas 
Bernoulli. Note that because Chebyshev’s inequality was not known in Bernoulli’s 
time, Bernoulli had to resort to a quite ingenious proof to establish the result. The 
general form of the weak law of large numbers presented in Theorem 2.1. was 
proved by the Russian mathematician Khintchine. 


8.3 The Central Limit Theorem 
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The central limit theorem is one of the most remarkable results in probability theory. 
Loosely put, it states that the sum of a large number of independent random 
variables has a distribution that is approximately normal. Hence, it not only provides 
a simple method for computing approximate probabilities for sums of independent 
random variables, but also helps explain the remarkable fact that the empirical 
frequencies of so many natural populations exhibit bell-shaped (that is, normal) 
curves. 


In its simplest form, the central limit theorem is as follows. 


Theorem 3.1 The central limit theorem 


Let X,,X2, ... be a sequence of independent and identically distributed random 
variables, each having mean p and variance 2. Then the distribution of 


X,++-+X, —np 
ovn 


tends to the standard normal as n — oo. That is, for —0o <a< o, 


a 
Ag FA = Th 1 2 
|; nT — ei er — e */2dx as n> 
ovn al 


The key to the proof of the central limit theorem is the following lemma, which we 
state without proof. 


Lemma 3.1 


Let Z,,Z2,... be a sequence of random variables having distribution functions 
Fz, and moment generating functions Mz ,n = 1, and let Z be a random variable 
having distribution function F, and moment generating function Mz. If 

Mz,, (t) > Mz(t) for all t, then Fz, (t) > Fz(t) for all t at which F7(t) is 
continuous. 


If we let Z be a standard normal random variable, then, since M,(t) = et” /2 it 
follows from Lemma 3.1 _ that if Mz, (t) > et’ /2 ag n > oo, then Fz, (t) > © (t) 


as n > ©, 
We are now ready to prove the central limit theorem. 


Proof of the Central Limit Theorem: Let us assume at first that 1 = 0 and 
o* = 1. We shall prove the theorem under the assumption that the moment 
generating function of the X;, M(t), exists and is finite. Now, the moment 
generating function of X; /Vn is given by 


n 
t n 
Thus, the moment generating function of ». X;/Vn is given by Im(=)| . Let 
n 


t= 1 


L(t) = log M(t) 


and note that 


Il 
a) 


L(0) 


LO) = 4 


M(0)M"(0) — [M'(0)]* 


L"(0 
ad [M(0)]* 


= E[X?] 
= 1 


Now, to prove the theorem, we must show that [M(t/vn)]" ~ et*/2 as n> o, or, 
equivalently, that nL(t/V/n) > t?/2 as n > ©. To show this, note that 


_ L(t/vn) — -L'(t/Vn)n-3/2t - 
lim ——— li ——,— by L’H6pital’s rule 
n—-o n 1 n—-20o —2n 2 
L'(t/Vn)t 
= lim eat / ) 
no 2n 1/2 


| 
= 


—L"(t/Vn)n-3/2¢? — 
—————.—_| again by L’H6pital’s rule 


= ( t ) cs 
7 we Vn/ 2 
t2 


Thus, the central limit theorem is proven when pu = 0 and a? = 1. The result now 
follows in the general case by considering the standardized random variables 
X; = (X; — »)/o and applying the preceding result, since E[X;] = 0, Var(X;) = 1. 


Remark Although Theorem 3.1 states only that, for each a, 
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X,+--+X, —np 
P34 ———————__ < a - ®(a 
oer \0@ 


it can, in fact, be shown that the convergence is uniform in a. [We say that 
f,,(@) > f(a) uniformly in a if, for each € > 0, there exists an N such that 
If ,(@) — f(a)| < for all a whenever n > N.] 


The first version of the central limit theorem was proven by DeMoivre around 1733 
for the special case where the X; are Bernoulli random variables with p = : The 


theorem was subsequently extended by Laplace to the case of arbitrary p. (Since a 
binomial random variable may be regarded as the sum of n independent and 
identically distributed Bernoulli random variables, this justifies the normal 
approximation to the binomial that was presented in Section 5.4.1.) Laplace also 
discovered the more general form of the central limit theorem given in Theorem 
3.1. His proof, however, was not completely rigorous and, in fact, cannot easily be 
made rigorous. A truly rigorous proof of the central limit theorem was first presented 
by the Russian mathematician Liapounoff in the period 1901-1902. 


Figure 8.1 __ illustrates the central limit theorem by plotting the probability mass 
functions of n independent random variables having a specified mass function when 
(a) n = 5, (b) n = 10, (Cc) n = 25, and (d) n = 100. 


Figure 8.1(a) 


s and the number of r 
variables to ) The output gives the 


d 


OL 


function of 


m along with its mean an 
variance. 


Figure 8.1(b) 
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er the probabilities and the number of random 


variables pe summed. The output gives the mass 


function of the sum along with its mean and 


variance. 


Figure 8.1(c) 
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Enter the probabilities and the number of random 


variables to be ned. The output gives the mass 
function of the sum along with its mean and 


variance. 


Figure 8.1(d) 
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Central Limit Theorem ae 


ter the probabilities and the number of random 


= 
= 


Example 3a 


An astronomer is interested in measuring the distance, in light-years, from his 
observatory to a distant star. Although the astronomer has a measuring 
technique, he knows that because of changing atmospheric conditions and 
normal error, each time a measurement is made, it will not yield the exact 
distance, but merely an estimate. As a result, the astronomer plans to make a 
series of measurements and then use the average value of these measurements 
as his estimated value of the actual distance. If the astronomer believes that the 
values of the measurements are independent and identically distributed random 
variables having a common mean d (the actual distance) and a common 
variance of 4 (light-years), how many measurements need he make to be 
reasonably sure that his estimated distance is accurate to within +.5 light-year? 


Solution 


Suppose that the astronomer decides to make n observations. If X,, X2,...,Xn 
are the n measurements, then, from the central limit theorem, it follows that 


629 of 848 


n 
>. X;—nd 
—~i=1 


vA 
se 2Vn 


has approximately a standard normal distribution. Hence, 


| 
Qa 
A 
w 
Il 


R 


o)-(--m()- 


Therefore, if the astronomer wants, for instance, to be 95 percent certain that his 
estimated value is accurate to within .5 light-year, he should make n * 
measurements, where n* is such that 


vn* vn* 
20( 5") —1=.95 or oF) = .975 


Thus, from Table 5.1 ofChapter5  , 


nN * 


=1.96 or n* = (7.84)? = 61.47 


As n* is not integral valued, he should make 62 observations. 


Note, however, that the preceding analysis has been done under the assumption 
that the normal approximation will be a good approximation when n = 62. 
Although this will usually be the case, in general the question of how large n 
need be before the approximation is “good” depends on the distribution of the X;. 
If the astronomer is concerned about this point and wants to take no chances, he 
can still solve his problem by using Chebyshev’s inequality. Since 


n 
Xi X; 4 
E >, a]=4 Var » = =— 
n n n 
i=1 i 


Chebyshev’s inequality yields 
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“1 X, 4 16 
P 3 ae | pe 

Hi 71 n(.5) n 
Hence, if he makes n = 16/.05 = 320 observations, he can be 95 percent certain 
that his estimate will be accurate to within .5 light-year. 


Example 3b 


The number of students who enroll in a psychology course is a Poisson random 
variable with mean 100. The professor in charge of the course has decided that if 
the number enrolling is 120 or more, he will teach the course in two separate 
sections, whereas if fewer than 120 students enroll, he will teach all of the 
students together in a single section. What is the probability that the professor 
will have to teach two sections? 


Solution 


The exact solution 


soo <7 (100)! 
il 


i = 120 


does not readily yield a numerical answer. However, by recalling that a Poisson 
random variable with mean 100 is the sum of 100 independent Poisson random 
variables, each with mean 1, we can make use of the central limit theorem to 
obtain an approximate solution. If X denotes the number of students who enroll in 
the course, we have 


P{X > 120} 


P{X > 119.5} (the continuity correction) 
* — 100 2 119.5 — tt 
V¥100 ~—Ss ‘100 
1 — (1.95) 
0256 


R 


R 


where we have used the fact that the variance of a Poisson random variable is 
equal to its mean. 


Example 3c 


If 10 fair dice are rolled, find the approximate probability that the sum obtained is 
between 30 and 40, inclusive. 


Solution 
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Let X; denote the value of the ith die, i = 1,2,...,10. Since 


35 


7 
E(Xi) = 5, Var(Xi) = E[X?] — (ELXil)* = 7. 


the central limit theorem yields 


29.5 — 39... X— 35. .40.5:—35 


Pi) < —_ < —__ 
350 350 350 
12 12 12 


2(1.0184) — 1 
x .692 


P{29.5 < X < 40.5} 


R 


Example 3d 


Let X;,i = 1,...,10, be independent random variables, each uniformly distributed 
10 


over (0, 1). Calculate an approximation to P ~ X; > 6>. 
=4 
Solution 


1 
Since E|X;| = ; and Var(X;) = 72’ we have, by the central limit theorem, 


10 
> i= 5 
B35 


1 — &(V1.2) 
1367 


10 
P > Xi > 6 
1 


R 


R 


10 
Hence, > X; will be greater than 6 only 14 percent of the time. 
i=1 


Example 3e 


An instructor has 50 exams that will be graded in sequence. The times required 
to grade the 50 exams are independent, with a common distribution that has 
mean 20 minutes and standard deviation 4 minutes. Approximate the probability 
that the instructor will grade at least 25 of the exams in the first 450 minutes of 
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work. 


Solution 


If we let X; be the time that it takes to grade exam i, then 


is the time it takes to grade the first 25 exams. Because the instructor will grade 
at least 25 exams in the first 450 minutes of work if the time it takes to grade the 
first 25 exams is less than or equal to 450, we see that the desired probability is 
P{X < 450}. To approximate this probability, we use the central limit theorem. 
Now, 


25 
E[X] = > E[X;] = 25(20) = 500 
and 


Var(X) = > Var(X;) = 25(16) = 400 


i=1 


Consequently, with Z being a standard normal random variable, we have 


X—500 450-500 
PIX < 450} = Pi) a s ae 
~ P{Z< —2.5} 
= P{Z > 2.5} 


1 — (2.5) ~ .006 


Central limit theorems also exist when the X; are independent, but not 
necessarily identically distributed random variables. One version, by no means 
the most general, is as follows. 


Theorem 3.2 Central limit theorem for independent random variables 


Let X,,X2,... be a sequence of independent random variables having respective 
means and variances pu, = E[X;],o? = Var(X;). If (a) the X; are uniformly 


bounded—that is, if for some M, P{|X;| < M} = 1 for all i, and (b) > of =a 
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—then 


as noo 


Historical note: Pierre-Simon, Marquis de Laplace (1749-1827) 


The central limit theorem was originally stated and proven by the French 
mathematician Pierre-Simon, Marquis de Laplace, who came to the theorem 
from his observations that errors of measurement (which can usually be 
regarded as being the sum of a large number of tiny forces) tend to be normally 
distributed. Laplace, who was also a famous astronomer (and indeed was 
called “the Newton of France”), was one of the great early contributors to both 
probability and statistics. Laplace was also a popularizer of the uses of 
probability in everyday life. He strongly believed in its importance, as is 
indicated by the following quotations taken from his published book Analytical 
Theory of Probability: “We see that the theory of probability is at bottom only 
common sense reduced to calculation; it makes us appreciate with exactitude 
what reasonable minds feel by a sort of instinct, often without being able to 
account for it.... It is remarkable that this science, which originated in the 
consideration of games of chance, should become the most important object of 
human knowledge.... The most important questions of life are, for the most 
part, really only problems of probability.” 


The application of the central limit theorem to show that measurement errors 
are approximately normally distributed is regarded as an important contribution 
to science. Indeed, in the seventeenth and eighteenth centuries, the central 
limit theorem was often called the law of frequency of errors. Listen to the 
words of Francis Galton (taken from his book Natural Inheritance, published in 
1889): “I know of scarcely anything so apt to impress the imagination as the 
wonderful form of cosmic order expressed by the ‘Law of Frequency of Error.’ 
The Law would have been personified by the Greeks and deified, if they had 
known of it. It reigns with serenity and in complete self-effacement amidst the 
wildest confusion. The huger the mob and the greater the apparent anarchy, the 
more perfect is its sway. It is the supreme law of unreason.” 


8.4 The Strong Law of Large Numbers 
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The strong law of large numbers is probably the best-known result in probability 
theory. It states that the average of a sequence of independent random variables 
having a common distribution will, with probability 1, converge to the mean of that 
distribution. 


Theorem 4.1 The strong law of large numbers 


Let X,,X2,... be a sequence of independent and identically distributed random 
variables, each having a finite mean uw = E[X;]. Then, with probability 1, 


I ee, 


>was noo 
n 


TThat is, the strong law of large numbers states that 


PL lim | (X, +..4+X,)/n= u} =1 


As an application of the strong law of large numbers, suppose that a sequence of 
independent trials of some experiment is performed. Let E be a fixed event of the 


experiment, and denote by P(E) the probability that E occurs on any particular 
trial. Letting 


_ 1 if FE occurs on the ith trial 
‘(0 if E does not occur on the ith trial 


we have, by the strong law of large numbers, that with probability 1, 


(4.1) 
Xi + oe + Xn 
——_—* s E[X] = P(E) 
n 
Since X, + --- +X, represents the number of times that the event E occurs in the 
first n trials, we may interpret Equation (4.1) as stating that with probability 1, 
the limiting proportion of time that the event E occurs is just P(E). 


Although the theorem can be proven without this assumption, our proof of the 
strong law of large numbers will assume that the random variables X; have a 
finite fourth moment. That is, we will suppose that E[X#] = K < 00, 


Proof of the Strong Law of Large Numbers: To begin, assume that py, the 
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n 
mean of the X;, is equal to 0. Let S,, = yy X; and consider 
i= 


E[Sn] = El(Xy ++ + Xp)(X1 + + Xn) 
x (X, + aes + Xn) (X41 + init + Xn)] 
Expanding the right side of the preceding equation results in terms of the form 
Xi, XPX;, X7X}7, XPXj;X_, and X;X;X,X, 
where i, j, k, and | are all different. Because all the X; have mean 0, it follows by 
independence that 


E[X?X;] = E[X?]E[X;] = 0 
E[X?X jXy] = E[X7]E[X;]E[Xx] = 0 
E[X;X jX,X1] 


| 
a) 


4 
Now, for a given pair i and j, there will be ()) = 6 terms in the expansion that will 


equal X?X7. Hence, upon expanding the preceding product and taking 
expectations term by term, it follows that 


E[si] = nex] + 6(5 JBXEX}} 


nk + 3n(n — 1)E[X?]E[X?] 


where we have once again made use of the independence assumption. Now, 
since 


0 < Var(X?) = E[X4] — (E[X?])’ 


we have 


Therefore, from the preceding, we obtain 


E[S#] < nK + 3n(n—1)K 


which implies that 
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Therefore, 


But the preceding implies that with probability 1, > 5% /n* < oo. (For if there is 
n=1 
a positive probability that the sum is infinite, then its expected value is infinite.) 
But the convergence of a series implies that its nth term goes to 0; so we can 
conclude that with probability 1, 
4 
ty 


im = 
n—20©o n* 


But if 54 /n* = (S,,/n)* goes to 0, then so must S,,/n; hence, we have proven 
that with probability 1, 


n 
—-0 as n-o 
n 


When yu, the mean of the X;, is not equal to 0, we can apply the preceding 
argument to the random variables X; — yw to obtain that with probability 1, 


or, equivalently, 


which proves the result. 


Figure 8.2 _ illustrates the strong law by giving the results of a simulation of n 
independent random variables having a specified probability mass function. The 


averages of the n variables are given when (a) n = 100, (b) n = 1000, and (c) 
n= 10,000. 


Figure 8.2(a) 


Strong Law of Large Numbers ae 


Enter the probabilities and the number of trials 
to be simulated. The output gives the total number 
of times each outcome occurs, and the average 

of all outcomes. 


Theoretical Mean = 2.05 
Sample Mean = 1.89 


Figure 8.2(b) 
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Strong Law of Large Numbers 


Enter the probabilities and the number of trials 
to be simulated. The output gives the total number 
of times each outcome occurs, and the average 

of all outcomes. 


Theoretical Mean = 2.05 
Sample Mean = 2.078 


Figure 8.2(c) 
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Strong Law of Large Numbers 


Enter the probabilities and the number of trials 
to be simulated. The output gives the total number 
of times each outcome occurs, and the average 

of all outcomes. 


Theoretical Mean = 2.05 


Sample Mean = 2.0416 


Many students are initially confused about the difference between the weak and the 
strong laws of large numbers. The weak law of large numbers states that for any 
specified large value n*,(X; +++ +Xy.)/n* is likely to be near uw. However, it does 
not say that (X, + --- + X,,)/n is bound to stay near uy for all values of n larger than 
n». Thus, it leaves open the possibility that large values of |(X,; +--+ X,)/n- | 
can occur infinitely often (though at infrequent intervals). The strong law shows that 


this cannot occur. In particular, it implies that, with probability 1, for any positive value 
é, 


will be greater than « only a finite number of times. 


The strong law of large numbers was originally proven, in the special case of 
Bernoulli random variables, by the French mathematician Borel. The general form of 


the strong law presented in Theorem 4.1 __ was proven by the Russian 
mathematician A. N. Kolmogorov. 


8.5 Other Inequalities and a Poisson Limit 
Result 


We are sometimes confronted with situations in which we are interested in obtaining 
an upper bound for a probability of the form P{X — uw = a}, where ais some positive 
value and when only the mean pu = E[X] and variance o? = Var(X) of the distribution 
of X are known. Of course, since X — uy => a > 0 implies that |X — u| = a, it follows 
from Chebyshev’s inequality that 


i 
P{X—p2a}sP(|X—p| Zaps when a>0O 


However, as the following proposition shows, it turns out that we can do better. 


Proposition 5.1 One-sided Chebyshev inequality 


If X is a random variable with mean 0 and finite variance o?, then, for any a > 0, 


2 


P\X=> < ——~ 
{ 2Us5 aT Be 


Proof Let b > 0 and note that 


X >a isequivalentto X+b=>a+b 


Hence, 


P{x >a} 


P{X+b>a+b} 


lA 


P{(X +b)? > (a+b)?} 


where the inequality is obtained by noting that sincea+b>0,X+b=a+b 
implies that (X + b)? > (a+ b)*. Upon applying Markov’s inequality, the 
preceding yields that 


E[(X+b)*]__ 0? +b* 


eet (a+b)* = (a+b)? 


Letting b = o*/a [which is easily seen to be the value of b that minimizes 
(o2 + b*)/(a + b)”] gives the desired result. 
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Example 5a 


If the number of items produced in a factory during a week is a random variable 
with mean 100 and variance 400, compute an upper bound on the probability that 
this week’s production will be at least 120. 


Solution 
It follows from the one-sided Chebyshev inequality that 


400 1 
P{X > 120} = P{X — 100 > 20} < ———_, = = 
400+(20)? 2 


Hence, the probability that this week’s production will be 120 or more is at most 
1 


2° 


If we attempted to obtain a bound by applying Markov’s inequality, then we would 
have obtained 


E(X) 5 
> < — = = 
P(x = 120} < = => 


which is a far weaker bound than the preceding one. 


Suppose now that X has mean pw and variance a”. Since both X — » and yw — X have 
mean 0 and variance o%, it follows from the one-sided Chebyshev inequality that, for 


a>0O, 
2 
(oy 
P{X —p =a} <  ——> 
one as o* +a? 
and 
2 
oO 
Pik X Say SS 
{u Ysa 
Thus, we have the following corollary. 
Corollary 5.1 
If E[X] = w and Var(X) = a7, then, for a > 0, 
P{X>ut+a} < z 
SEN gtk 2 
2 
oO 
P{X <u < 
Xsu-a} s a7 +a? 
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Example 5b 


A set of 200 people consisting of 100 men and 100 women is randomly divided 
into 100 pairs of 2 each. Give an upper bound to the probability that at most 30 of 
these pairs will consist of a man and a woman. 


Solution 


Number the men arbitrarily from 1 to 100, and for i = 1,2,... 100, let 


: if man iis paired with awoman 
i 


0 otherwise 
Then X, the number of man—woman pairs, can be expressed as 
100 
xX = » Xj 
i=1 
Because man i is equally likely to be paired with any of the other 199 people, of 


which 100 are women, we have 


100 


E[X;] = P{X, == 199 


Similarly, for i # j, 


E[X;X;] = P{X; =1,X; = 1} 
_ 100 99 
~ 199 197 


P{X; = 1}P{X; = 1|X; = 1} 


where P{X; = 1|X; = 1} = 99/197, since, given that man i is paired with a 
woman, man j is equally likely to be paired with any of the remaining 197 people, 
of which 99 are women. Hence, we obtain 
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100 
= (100) 555 
= 50.25 
100 
Var(X) = >». Var(X;) + 2 bs >, cov(Xi X)) 
a <j 
= ne 100 99 ieee 100\|100 99 100\* 
= 199 199 2 /|199197 \199 
= 25.126 
The Chebyshev inequality then yields 
25.126 


P{X < 30} < P{|X — 50.25| > 20.25} < ———|, = .061 
(20.25) 


Thus, there are fewer than 6 chances in 100 that fewer than 30 men will be 
paired with women. However, we can improve on this bound by using the one- 
sided Chebyshev inequality, which yields 


P{X < 30} = P{X < 50.25 — 20.25} 
25.126 
25.126 + (20.25) 
= .058 


When the moment generating function of the random variable X is known, we can 
obtain even more effective bounds on P{X = a}. Let 


M(t) = Ele™] 


be the moment generating function of the random variable X. Then, for t > 0, 


P{X >a} = Pf{e% > et} 


IA 


Ele“ |e by Markov's inequality 


Similarly, for t < 0, 


P{X < a} 


P{etX > eta} 


Ele |e” 


IA 


Thus, we have the following inequalities, known as Chernoff bounds. 


Proposition 5.2 Chernoff bounds 
P{xX >a}<e M(t) forall t>0 
P{X <a}<e M(t) forall t<0 


Since the Chernoff bounds hold for all t in either the positive or negative 
quadrant, we obtain the best bound on P{X = a} by using the t that minimizes 
e "4M (t). 


Example 5c Chernoff bounds for the standard normal random variable 


If Z is a standard normal random variable, then its moment generating function is 
M(t) = e*’/2, so the Chernoff bound on P{Z > a} is given by 


P{Z>a}<e et’/2 forall t>0 


Now the value of t,t > 0, that minimizes et /2-ta is the value that minimizes 
t?/2 — ta, which is t = a. Thus, for a > 0, we have 


Piz > a} < eo /2 


Similarly, we can show that, for a < 0, 


Example 5d Chernoff bounds for the Poisson random variable 


If X is a Poisson random variable with parameter A, then its moment generating 
function is M(t) = e2¢'-). Hence, the Chernoff bound on P{X > i} is 


Pixzit<ete Det t>0 
Minimizing the right side of the preceding inequality is equivalent to minimizing 
A(e’ — 1) — it, and calculus shows that the minimal value occurs when ef = i/A. 


Provided that i/A > 1, this minimizing value of t will be positive. Therefore, 
assuming that i > A and letting e' = i// in the Chernoff bound yields 


ay’ 
P{x>i}< eor-o(7) 
U 


or, equivalently, 
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e (ea)! 
— 


P{x>i}< 


Example 5e 


Consider a gambler who is equally likely to either win or lose 1 unit on every play, 
independently of his past results. That is, if X; is the gambler’s winnings on the i 
th play, then the X; are independent and 


Pix, = 1} = Pix, = -1}=3 


n 
Let S, = >. X;, denote the gambler’s winnings after n plays. We will use the 
i=1 
Chernoff bound on P{S,, = a}. To start, note that the moment generating function 
of X; is 


Now, using the McLaurin expansions of e* and e ‘, we see that 


te i 
t4 p-t pa Gage ae? es ee ee ae 
e'te Lotte oar (1 tay ait 


II 
N 
ENS 
ay 
+ 
al AS 
+ 

es 
+ 
H— 


lA 


cr 772)" 
2 >; a since (2n)! > n!2” 


n=0 


2et’/2 


Therefore, 


E[e™*] = gta /2 
Since the moment generating function of the sum of independent random 
variables is the product of their moment generating functions, we have 
Ele] = (Ele™*])" 


< ent?/2 
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Using the preceding result along with the Chernoff bound gives 


Pis, 2a} <etMen"/2 t>9 


The value of t that minimizes the right side of the preceding is the value that 
minimizes nt? /2 — ta, and this value is t = a/n. Supposing that a > 0 (so that the 
minimizing t is positive) and letting t = a/n in the preceding inequality yields 


P{s, zab<e-a’/2 a>0 


This latter inequality yields, for example, 


PlSig 2 6} S08" 290) = 1653 


whereas the exact probability is 
P{S19 => 6} = P{gambler wins atleast 8 of the first 10 games} 


10 P 10 n 10 
8 9 10 56 
= 7, ® 0547 


2 1024 


The next inequality is one having to do with expectations rather than probabilities. 
Before stating it, we need the following definition. 
Definition 
A twice-differentiable real-valued function f(x) is said to be convex if f"(x) = 0 
for all x; similarly, it is said to be concave if f"(x) < 0. 


Some examples of convex functions are f(x) = x2, f(x) =e, and f(x) = —x1/™ for 
x = 0. If f(x) is convex, then g(x) = — f(x) is concave, and vice versa. 


Proposition 5.3 Jensen’s inequality 


If f(x) is a convex function, then 
E[f(X)] = f(ElX) 
provided that the expectations exist and are finite. 
Proof Expanding f(x) in a Taylor’s series expansion about yw = E[X] yields 


" _ 2 
FO) =f +f We-w+- OS 
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where é is some value between x and u. Since f”(&) = 0, we obtain 
fM2fMH+fhWMa-wH 
Hence, 
fHN2fwW+HhWMWA-w 
Taking expectations yields 
E[f(X)] = fH) + fF WELX — w= fH 
and the inequality is established. 


Example 5f 


An investor is faced with the following choices: Either she can invest all of her 
money in a risky proposition that would lead to a random return X that has mean 
m, or she can put the money into a risk-free venture that will lead to a return of m 
with probability 1. Suppose that her decision will be made on the basis of 
maximizing the expected value of w(R), where R is her return and u is her utility 
function. By Jensen’s inequality, it follows that if uw is a concave function, then 
E{u(X)| < u(m), so the risk-free alternative is preferable, whereas if u is convex, 
then E|u(X)] = u(m), so the risky investment alternative would be preferred. 


The following proposition, which implies that the covariance of two increasing 
functions of a random variable is nonnegative, is quite useful. 


Proposition 5.4 


If f and g are increasing functions then 
E[f(X)g(X)] = E[f(IElLg(X)] 


Proof To prove the preceding inequality suppose that X and Y are independent 
with the same distribution and that f and g are both increasing functions. Then, 
because f and g are increasing, f(X) — f(Y) and g(X) — g(Y) will both be 
positive when X > Y and will both be negative when X < Y. Consequently, their 
product is positive. That is, 


F(X) — fM)(9® — 9g) 2 0 
Taking expectations gives 


E[(f(X) — FW))(g(X) — g(¥))] 2 0 
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Multiplying through and taking expectations term by term yields 


(5.1) 
Elf(X)9(X)] — Elf(Y)9M)] — EFM g(X4)] + ELF) 9M) = 0 
Now, 
E[f(X)g(Y)| = El[f(XYJElg(Y)| by independence of X and Y 


= E|f(X)|E[g(X)] because X and Y have the same distribution 


Similarly, E|f(Y)9(X)] = E[fW)JElg(X)] = ELF(X)JElg(X)], and 
E[f(Mg()] = Elf(*)g(X)]. Hence, from Equation (5.1) — we obtain that 


2E[f(X)g(X)] — 2Elf(X)JElg(X)] = 0 
which proves the result. 


Example 5g 


Suppose there are m days in a year, and that each person is independently born 
m 

on day r with probability p,,r = 1,...,m, > p, = 1. Let A; ; be the event that 
r=1 


persons i and j are born on the same day. In Example 5c of Chapter4 , 
we showed that the information that persons 1 and 2 have the same birthday 
makes it more likely that persons 1 and 3 have the same birthday. After proving 
this result, we argued that it was intuitive because if “popular days” are the ones 
whose probabilities are relatively large, then knowing that 1 and 2 share the 
same birthday makes it more likely (than when we have no information) that the 
birthday of person 1 is a popular day and that makes it more likely that person 3 
will have the same birthday as does 1. To give credence to this intuition, suppose 
that the days are renumbered so that p, is an increasing function of r. That is, 
renumber the days so that day 1 is the day with lowest birthday probability, day 2 
is the day with second lowest birthday probability, and so on. Letting X be the 
birthday of person 1, then because the higher numbered days are the most 
popular our intuitive explanation would lead us to believe that the expected value 
of X should increase upon the information that 1 and 2 have the same birthday. 
That is, it should be that E[X|A12] > E|X]. To verify this, let Y be the birthday of 
person 2, and note that 
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P(X =7,412) 


P(X =r|A,2) 


P(A 2) 

_ PX =7r,Y=r) 

~ Y.PX=nY =r) 
_ 
y De 
Hence, 

Maes 
E[X|A,2] = XrP(X =rlA,2) = = + 
r De Pr 


Because E|X] = )' .rP(X =r) = »_rp,, we need show that 
rp? = (Srp,.)(E02) 
Tr rT Tr 


But 


E[Xpyl = > rp,P& =") = > rp? Elpyl= > v2, ELX]= > rp, 


- 
and thus we must show that 
E[Xp,] = E[p, |E[X] 


which follows from Proposition 5.4 because both f(X) = X and g(X) = p, 
are increasing functions of X. 


When f(x) is an increasing and g(x) is a decreasing function, then it is a simple 
consequence of Proposition 5.4 __ that 


Elf(X)9(4)] S EIFOIETI)] 


We leave the verification of the preceding as an exercise. 


Our final example of this section deals with a Poisson limit result. 


Example 5h A Poisson Limit Result 


Consider a sequence of independent trials, with each trial being a success with 
probability p. If we let Y be the number of trials until there have been a total of r 
successes, then Y is a negative binomial random variable with 


r 
Thus, when p = aa we have that 


E[Y]=r+a, Var(y) =“—* 


Now, when r is large, Var(Y) ~ A. Thus, as r becomes larger, the mean of Y 
grows proportionately with r, while the variance converges to 7. Hence, we might 
expect that when r is large Y will be close to its mean value of r + 2. Now, if we 
let X be the number of failures that result in those Y trials - that is, X is the 
number of failures before there have been a total of r successes - then when r is 
large, because Y is approximately r + A, it would seem that X would 
approximately have the distribution of the number of failures in r + A independent 


trials when each trial is a failure with probability 1 — p = ao But by the 
Poisson limit of the binomial, such a random variable should approximately be 
Poisson with mean (r + yn = A. That is, as r — 00, we might expect that the 
distribution of X converges to that of a Poisson random variable with mean A. We 


now show that this is indeed true. 


Because X = k if the r“" success occurs on trial r + k, we see that 


P(X =k) 


PY =r+k) 


= ( ra-p) 
r—-1 


Si aw 

lan 

aN 

| 
a>) 
Rew) 

co 
| 


ree SHl\, a :* 
( k \—= 


(rtk—-VD(rt+k—2)-r ak 
k! (r+a)* 


AKrt+k—-1 r+k-—-2 rT 
kl! rt+a r+a r+a 
yk 


— > 0 
kl as rT 


Also, 
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1 r+a" A.” F 
ar ) =(+7) >e as r > © 


r 


Thus, we see that 


P(X =k) >e4_ as Tr > 00 


8.6 Bounding the Error Probability When 
Approximating a Sum of Independent 
Bernoulli Random Variables by a Poisson 
Random Variable 


In this section, we establish bounds on how closely a sum of independent Bernoulli 
random variables is approximated by a Poisson random variable with the same 
mean. Suppose that we want to approximate the sum of independent Bernoulli 
random variables with respective means p,,p,, ...,p,,. Starting with a sequence 
Y1,--+,¥n Of independent Poisson random variables, with Y; having mean p,, we will 
construct a sequence of independent Bernoulli random variables X,,... ,X,, with 
parameters p,, ...,p,, such that 


P{X; #Yi} <p? for eachi 


n n 
Letting X = > X,; and Y = » Y;, we will use the preceding inequality to conclude 
os i=1 


that 
n 
PIXSY< > pe 
i=1 


Finally, we will show that the preceding inequality implies that for any set of real 
numbers A, 


|P{X € A}—P{Y EA} < > pe 


i=1 


Since X is the sum of independent Bernoulli random variables and Y is a Poisson 
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random variable, the latter inequality will yield the desired bound. 


To show how the task is accomplished, let Y;,i = 1, ... ,.n be independent Poisson 
random variables with respective means p,. Now let U;,...,U;, be independent 
random variables that are also independent of the Y;’s and are such that 


0 with probability (1 — p,)e?: 
U;= 
: 1 with probability 1 — (1 — p,)e?i 


This definition implicitly makes use of the inequality 
eP>1-p 
in assuming that (1 — p,)e?i < 1. 
Next, define the random variables X;,i = 1,...,n, by 
Xj = 
1 otherwise 

Note that 

P{X; = 0} = P{Y; = O}P{U; = 0} =e Pi(1 —p,je?i =1-p, 

P{X; = 1} = 1— P{X; = 0} =p, 
Now, if X; is equal to 0, then so must Y; equal 0 (by the definition of X;). Therefore, 


P{X; # Y;} = P{X; = 1,Y,;+4 1} 
P{Y,; = 0,X; = 1}4+ P{Y; > 1} 


= P{Y; = 0,U; = 1} + P{Y; > 1} 
=e Fill =—pjeri| le Pia p.e oh 
= DP; =p eit 


=p (since1 —e ? <p) 


n n 
Now let X = > X; and Y > Y;,, and note that X is the sum of independent 
i=1 ca 
Bernoulli random variables and Y is Poisson with the expected value 


n 
E[Y] = E[X] = > p,- Note also that the inequality xX # Y implies that X; # Y;, for 
i=1 


some i, SO 
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A 


P{X#Y} < P{X; # Y;forsome i} 
n 


> P{X; # Y;} (Boole's inequality) 


A 


IA 

[42 
so 
3, 


For any event B, let J, the indicator variable for the event B, be defined by 


= 1 if Boccurs 
B® l0 otherwise 


Note that for any set of real numbers A, 
Ky ea} ~ ly ea} S ly zy} 


The preceding inequality follows from the fact that since an indicator variable is either 
0 or 1, the left-hand side equals 1 only when Iry.4) = 1 and Ir. 4) = 0. But this 


would imply that X € A and Y¢A, which means that X + Y, so the right side would 
also equal 1. Upon taking expectations of the preceding inequality, we obtain 


P{X € A}— P{Y € A} < P{X #Y} 


By reversing X and Y, we obtain, in the same manner, 


P{Y € A}—P{X € A} < P{X #Y} 


Thus, we can conclude that 


|P{X € A} — P{y € A}| < P{xX #Y} 


n 
Therefore, we have proven that witha= = p,, 


Remark When all the p, are equal to p, X is a binomial random variable. Hence, the 
preceding inequality shows that, for any set of nonnegative integers A, 
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> (ippta-e YP 


i ie A 


8.7 The Lorenz Curve 


The Lorenz curve L(p), 0 < p < 1 is a plot of the fraction of the total income of a 
population that is earned by the 100p percent of the population having the lowest 
incomes. For instance, L(.5) is the fraction of total income earned by the lower half of 
income earners. Suppose that the earnings of the members of a population can be 
represented by the quantities X,,X2,... where the X; are independent and identically 
distributed positive continuous random variables with distribution function F. Now, let 
X be arandom variable with distribution F, and define SD to be that value such that 


Pix <é,| =F) =p 


The quantity SD is called the 100p percentile of the distribution F. With I(x) defined by 


1, ifx< oa 
I(x) = 
0, tase 
p 
1(Xy) + ... +1Xy), 
it follows that a is the fraction of the first 7 members of the 


population that have incomes less than Sp Upon letting n > ©, and applying the 
strong law of large numbers to the independent and identically distributed random 
variables I(X;,), k = 1, the preceding yields that, with probability 1, 


1X1) +. +1 Xp 
lin At FID) = pricy] = FE,) =P 


nao nN 
That is, with probability 1, p is the fraction of the population whose income is less 
than cae The fraction of the total income earned by those earning less than ce can be 


obtained by noting that the fraction of the total income of the first nm members of the 
Xq1(X1) +... + Xnl(Xn) 


lation that is fi th ing | that € i . Letti 
population that is from those earning less that ¢, is Va. 2x, etting 
n — o, yields that 
NOE ie PR 
E[XI(X)] 
_ 3 n = 
mS Xi+... +Xp EX)’ 
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where the final equality was obtained by applying the strong law of large numbers to 
both the numerator and denominator of the preceding fraction. Letting » = E[X], and 
noting that 


co gn 
E[X I(X)] = | xI(x)f (x)dx = | x f(x)dx 
0 0 


shows that 
(7.1) 
bp 
rp) = 2 AI@O) | pear 
Pp) FIX] ELX] 
0 
Example 7a 


If F is the distribution function of a uniform random variable on (a,b), where 
0<a<b,then 


x 
1 x-—a 
F(x) = a a a<x<b 
a 


a 
b-a’ 


of a uniform (a,b) random variable is (a + b)/2, we obtain from Equation 


(7.1) _ that 
a+(b-—a)p 
: air, 
(P) a+b b-a a 
a 


(a+ (b—a)p)” — a? 


Because p = P(g) = we see that oe = a+ (b-—a)p. Because the mean 


(a +b)(b—a) 
2pa + (b —a)p? 
a+b 


When a = 0, the preceding gives that L(p) = p?. Also, letting a converge to b 
gives that 


lim L(p)=p 
ap 


which can be interpreted as saying that L(p) = p when all members of the 
population earn the same amount. 


A useful formula for L(p) can be obtained by letting 


0, ifx<é, 


1 oat a (C2 as 
. “~ ?p 


and then noting that 


_ E[X] — E[XI(X)] _ E[XJ(X)] 
(aa) aie. a 


Conditioning on J(X) gives that 


E[X](X)] E[XJ(X)|J(X) = 1]PU(X) = 1) + E[XJ(X)J@) = 0)PU(X) = 0) 


E[X|X>&]G—p) 


which shows that 
(7.2) 


E[xIX > | —p) 
oe L(p) = — EIX] 


Example 7b 

If F is the distribution function of an exponential random variable with mean 1, 
then p = PE J=1= e °P,and so §, = —log(1 — p). Because the lack of 
memory property of the exponential implies that 

E|X|x = é,| = ca + E[X] = oi + 1, we obtain from Equation (7.2) that the 


fraction of all income that is earned by those earning at least - is 


1—L(p) G,+DGa—p) 
(1 —log(1 — p))(1 — p) 


= 1-—p-—(1-p)log(1 — p) 


giving that 
L(p) =p+ (1— p)log(1 — p) 


Example 7c 


If F is the distribution function of a Pareto random variable with parameters A > 0, 
N nN 


a a 
a> 0, then F(x) =1- mark > a. Consequently, p = Ce ad Br giving that 
p 
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= or §,=a(1—p) 


A 
When A > 1, it was shown in Section 5.6.5 that E[X] = A = In addition, it 


T: 
was shownin Example 5f of Chapter6 _ that if X is Pareto with parameters 
A,a then the conditional distribution of X given that it exceeds x9, x) > a, is Pareto 


ag 
with parameters A, x). Consequently, when A > 1, E|X|x > é.| = EL. and thus 


Equation (7.2) yields that 


E[xIX>&|@-p) _§@-p) _ 


— yy\i-1/a 
ELX] (1 — p) 


1—L(p) = 

or 
A-1 
L(p)=1-(1-p) 4 


We now prove some properties of the function L(p). 


Proposition 7.1 


L(p) is an increasing, convex function of p, such that L(p) < p. 


Proof That L(p) increases in p follows from its definition. To prove convexity we 
must show that L(p + a) — L(p) increases in p for p < 1 — a; or equivalently, that 
the proportion of the total income earned by those with incomes between - and 
increases in p. But this follows because, for all p, the same proportion of the 
and oe 
increases in p. (For instance, 10 percent of the population have incomes in the 


ae 
population — namely, 100a percent - earns between co and ida 
40 to 50 percentile and 10 percent of the population have incomes in the 45 to 55 
percentile, and as the incomes earned by the 5 percent of the population in the 
40 to 45 percentile are all less than those earned by the 5 percent of the 
population in the 50 to 55 percentile, it follows that the proportion of the 
population income of those in the 40 to 50 percentile is less than the proportion 
of those in the 45 to 55 percentile.) To establish that L(p) < p, we see from 
Equation (7.1) _ that we need to show that E|[X/(X)]| < E|X|p. But this follows 
because I(x), equal to 1 ifx < on and to 0 ifx = SB is a decreasing and h(x) = x 


is an increasing function of x, which from Proposition 5.4 — implies that 
E[XI(X)] S$ E[X]E[I(X)] = ELX1p. 


Because L(p) < p with L(p) = p when all members of the population have the same 
income, the area of the “hump”, equal to the region between the straight line and the 
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Lorenz curve (the shaded region in Figure 8.3 __ ), is an indication of the inequality 
of incomes. 


Figure 8.3 The Hump of the Lorenz Curve 
I 


38/60 


L(p) 


0 0.2 0.4 0.6 0.8 I 


A measure of the inequality of the incomes is given by the Gini index, which is the 
ratio of the area of the hump divided by the area under the straight line L(p) = p. 
Because the area of a triangle is one half its base times its height, it follows that the 
Gini index, call it G, is given by 


1f2= fie )d : 

- ger = 

G= 1/2 =1 | L(p)dp 
0 


Example 7d 

Find the Gini index when F, the distribution of earnings for an individual in the 
population, is uniform on (0, 1), and when F is exponential with rate A. 
Solution 


When F is the uniform (0,b) distribution, then as shown in Example 7a __, 
L(p) = p?, giving that G = 1 — 2/3 = 1/3. When F is exponential, then from 
Example 7b 
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1 1 
| L(p)dp = | (p + (1 — p)log(1 — p))dp 
0 0 


1 
1 
5+ | ston 
0 


Integrating by parts with u = log x,dv = xdx shows that 


1 1 
x 
[ sou - -| 3 ax = —1/4 
0 0 


where L’hopital’s rule was used to obtain that lim, _,) x” log(x) = 0. Hence, 


1 
| L(p) dp = 1/4, giving that G = 1/2. Because larger values of G indicate more 
0 


inequality, we see that the inequality is larger when the distribution is exponential 
than when it is uniform. 


Summary 


Two useful probability bounds are provided by the Markov and Chebyshev 
inequalities. The Markov inequality is concerned with nonnegative random variables 
and says that for X of that type, 


E|X] 
P{x >a}< "i 


for every positive value a. The Chebyshev inequality, which is a simple consequence 
of the Markov inequality, states that if X has mean wu and variance o?, then, for every 
positive k, 


1 
P{|X —p| > ko} < a 


The two most important theoretical results in probability are the central limit theorem 
and the strong law of large numbers. Both are concerned with a sequence of 
independent and identically distributed random variables. The central limit theorem 
says that if the random variables have a finite mean pw and a finite variance o?, then 
the distribution of the sum of the first n of them is, for large n, approximately that of a 
normal random variable with mean ny and variance no’. That is, if X;,i = 1, is the 
sequence, then the central limit theorem states that for every real number a, 


a 
ee es eee 1 ? 
lim =P} -————*——- < ap = —=]  e *'/?dx 
n > co avn al 


The strong law of large numbers requires only that the random variables in the 
sequence have a finite mean u. It states that with probability 1, the average of the 
first n of them will converge to 1 as n goes to infinity. This implies that if A is any 
specified event of an experiment for which independent replications are performed, 
then the limiting proportion of experiments whose outcomes are in A will, with 
probability 1, equal P(A). Therefore, if we accept the interpretation that “with 
probability 1" means “with certainty,” we obtain the theoretical justification for the 
long-run relative frequency interpretation of probabilities. 


Problems 


8.1. Suppose that X is a random variable with mean and variance 
both equal to 20. What can be said about P{0 < X < 40}? 
8.2. From past experience, a professor knows that the test score of a 
student taking her final examination is a random variable with mean 
To: 
a. Give an upper bound for the probability that a student’s test 
score will exceed 85. 
b. Suppose, in addition, that the professor knows that the 
variance of a student’s test score is equal to 25. What can be 
said about the probability that a student will score between 65 
and 85? 
c. How many students would have to take the examination to 
ensure with probability at least .9 that the class average would 
be within 5 of 75? Do not use the central limit theorem. 


8.3. Use the central limit theorem to solve part (c) of Problem 8.2. 
8.4. Let X,,...,X29 be independent Poisson random variables with 


mean 1. 
a. Use the Markov inequality to obtain a bound on 


20 
P > > 15 
1 


b. Use the central limit theorem to approximate 
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20 
P > ¥i > 15 
1 


8.5. Fifty numbers are rounded off to the nearest integer and then 
summed. If the individual round-off errors are uniformly distributed 
over ( —.5,.5), approximate the probability that the resultant sum 
differs from the exact sum by more than 3. 

8.6. A die is continually rolled until the total sum of all rolls exceeds 
300. Approximate the probability that at least 80 rolls are necessary. 
8.7. A person has 100 light bulbs whose lifetimes are independent 
exponentials with mean 5 hours. If the bulbs are used one at a time, 
with a failed bulb being replaced immediately by a new one, 
approximate the probability that there is still a working bulb after 525 
hours. 

8.8. In Problem 8.7 __, suppose that it takes a random time, 
uniformly distributed over (0, .5), to replace a failed bulb. 
Approximate the probability that all bulbs have failed by time 550. 
8.9. If X is a gamma random variable with parameters (n, 1), 
approximately how large must n be so that 


Xx 
all; = 1 > o| <.01? 


8.10. Civil engineers believe that W, the amount of weight (in units of 
1000 pounds) that a certain span of a bridge can withstand without 
structural damage resulting, is normally distributed with mean 400 
and standard deviation 40. Suppose that the weight (again, in units 
of 1000 pounds) of a car is a random variable with mean 3 and 
standard deviation .3. Approximately how many cars would have to 
be on the bridge span for the probability of structural damage to 
exceed .1? 

8.11. Many people believe that the daily change of price of a 
company’s stock on the stock market is a random variable with mean 
0 and variance o?. That is, if Y,, represents the price of the stock on 


the nth day, then 
VY, =Yyea tk, nai 


where X,,X>,... are independent and identically distributed random 
variables with mean 0 and variance a”. Suppose that the stock’s 
price today is 100. If c? = 1, what can you say about the probability 
that the stock’s price will exceed 105 after 10 days? 

8.12. We have 100 components that we will put in use in a sequential 


662 of 848 


fashion. That is, component 1 is initially put in use, and upon failure, 
it is replaced by component 2, which is itself replaced upon failure by 
component 3, and so on. If the lifetime of component i is 
exponentially distributed with mean 10 + i/10,i = 1,... ,100, estimate 
the probability that the total life of all components will exceed 1200. 
Now repeat when the life distribution of component i is uniformly 
distributed over (0,20 + i/5),i = 1,...,100. 

8.13. Student scores on exams given by a certain instructor have 
mean 74 and standard deviation 14. This instructor is about to give 
two exams, one to a class of size 25 and the other to a class of size 
64. 

a. Approximate the probability that the average test score in the 
class of size 25 exceeds 80. 

b. Repeat part (a) for the class of size 64. 

c. Approximate the probability that the average test score in the 
larger class exceeds that of the other class by more than 2.2 
points. 

d. Approximate the probability that the average test score in the 
smaller class exceeds that of the other class by more than 2.2 
points. 


8.14. A certain component is critical to the operation of an electrical 
system and must be replaced immediately upon failure. If the mean 
lifetime of this type of component is 100 hours and its standard 
deviation is 30 hours, how many of these components must be in 
stock so that the probability that the system is in continual operation 
for the next 2000 hours is at least .95? 
8.15. An insurance company has 10,000 automobile policyholders. 
The expected yearly claim per policyholder is $240, with a standard 
deviation of $800. Approximate the probability that the total yearly 
claim exceeds $2.7 million. 
8.16. A.J. has 20 jobs that she must do in sequence, with the times 
required to do each of these jobs being independent random 
variables with mean 50 minutes and standard deviation 10 minutes. 
M.J. has 20 jobs that he must do in sequence, with the times 
required to do each of these jobs being independent random 
variables with mean 52 minutes and standard deviation 15 minutes. 
a. Find the probability that A.J. finishes in less than 900 minutes. 
b. Find the probability that M.J. finishes in less than 900 minutes. 
c. Find the probability that A.J. finishes before M.J. 


8.17. Redo Example 5b —_ under the assumption that the number of 
man—woman pairs is (approximately) normally distributed. Does this 
seem like a reasonable supposition? 
8.18. Repeat part (a) of Problem 8.2 — when it is known that the 
variance of a student’s test score is equal to 25. 
8.19. A lake contains 4 distinct types of fish. Suppose that each fish 
caught is equally likely to be any one of these types. Let Y denote the 
number of fish that need be caught to obtain at least one of each 
type. 
a. Give an interval (a, b) such that P{a < Y < b} > .90. 
b. Using the one-sided Chebyshev inequality, how many fish 
need we plan on catching so as to be at least 90 percent 
certain of obtaining at least one of each type? 


8.20. If X is a nonnegative random variable with mean 25, what can 
be said about 

a. E[X*]? 

b. E[VX]? 

c. E[log x]? 

d. E[e~*]? 


8.21. Let X be a nonnegative random variable. Prove that 
E[X] s (E[x*p’”” < epx*p’" <= 


8.22. Would the results of Example 5f — change if the investor were 
allowed to divide her money and invest the fraction a,0 < a < 1,in 
the risky proposition and invest the remainder in the risk-free 
venture? Her return for such a split investment would be 
R=axX+(1-—a)m. 
8.23. Let X be a Poisson random variable with mean 20. 

a. Use the Markov inequality to obtain an upper bound on 

p = P{X > 26} 


b. Use the one-sided Chebyshev inequality to obtain an upper 
bound on p. 

c. Use the Chernoff bound to obtain an upper bound on p. 

d. Approximate p by making use of the central limit theorem. 

e. Determine p by running an appropriate program. 


8.24. If X is a Poisson random variable with mean 100, then 


P{X > 120} is approximately 
a. .02, 
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b. .5 or 
c. .3? 


8.25. Suppose that the distribution of earnings of members of a 
population is Pareto with parameters A,a > 0, where 

log(5) 

~ Jog(4) 

a. Show that the top 20 percent of earners earn 80 percent of the 


~ 1.161. 


total earnings. 

b. Show that the top 20 percent of the top 20 percent of earners 
earn 80 percent of the earnings of the top 20 percent of 
earners. (That is, show that the top 4 percent of all earners 
earn 80 percent of the total earnings of the top 20 percent of 
all earners.) 


8.26. If f(x) is an increasing and g(x) is a decreasing function, show 
that Elf (x) g9(X)] Ss EIf(X)Elg(4))- 
8.27. If L(p) is the Lorenz curve associated with the random variable 
E|X|X < &,|p 

E[x]} 
8.28. Suppose that L(p) is the Lorenz curve associated with the 


X, show that L(p) = 


random variable X and that c > 0. 
a. Find the Lorenz curve associated with the random variable cx. 
b. Show that L..(p), the Lorenz curve associated with the random 
variable X +c, is 
L(p)E[X] + pe 
L = 
c(P) E[X] +c 


c. Verify that the answer to part (b) is in accordance with the 
formulas given in Example 7a _ in the case that X is uniform 
over the interval (0,b — a) andc =a. 


Theoretical Exercises 


8.1. If X has variance c?, then o, the positive square root of the variance, is 
called the standard deviation. If X has mean u and standard deviation a, show 
that 

1 


P{|X —p| = ko} < ke 
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8.2. If X has mean pw and standard deviation o, the ratio r = |u|/a is called the 
measurement signal-to-noise ratio of X. The idea is that X can be expressed 
as X =uw+ (X — yp), with uw representing the signal and X — yu the noise. If we 
define |(X — »)/u| = D as the relative deviation of X from its signal (or mean) 


Lt, show that for a > 0, 
1 


POSS I=— 


8.3. Compute the measurement signal-to-noise ratio—that is, |u| /o, where 
p= E[X] and o? = Var(X)-of the following random variables: 

a. Poisson with mean /; 

b. binomial with parameters n and p; 

c. geometric with mean 1/p; 

d. uniform over (a, b); 

e. exponential with mean 1/4; 

f. normal with parameters p, o?. 


8.4. Let Z,,,n = 1, be a sequence of random variables and c a constant such 
that for each e > 0,P{|Z, —c| > «} > 0 as n > o. Show that for any bounded 
continuous function g, 


Elg(Z,)] > g(c) as n> 
8.5. Let f(x) be a continuous function defined for 0 < x < 1. Consider the 


a=) (5 )(p}a-9"* 
k 


=0 


functions 


(called Bernstein polynomials) and prove that 
__ lim, Bn(x) = f@) 


Hint: Let X,,X2,... be independent Bernoulli random variables with mean x. 


Show that 
X, + +Xq 
Bata) = a|)(-—**s} 


and then use Theoretical Exercise 8.4 

Since it can be shown that the convergence of B,,(x) to f(x) is uniform in x, 
the preceding reasoning provides a probabilistic proof of the famous 
Weierstrass theorem of analysis, which states that any continuous function on 
a Closed interval can be approximated arbitrarily closely by a polynomial. 

8.6. 
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a. Let X be a discrete random variable whose possible values are 1,2,.... 
If P{X = k} is nonincreasing in k = 1,2,..., prove that 


P{X =k} <2 i 


b. Let X be a nonnegative continuous random variable having a 
nonincreasing density function. Show that 


f(x) < 2E[X| 


5 forall x>0 
x 


8.7. Suppose that a fair die is rolled 100 times. Let X; be the value obtained 


on the ith roll. Compute an approximation for 
100 


P | [xs 1<a<6 
1 


8.8. Explain why a gamma random variable with parameters (t,4) has an 

approximately normal distribution when t is large. 

8.9. Suppose a fair coin is tossed 1000 times. If the first 100 tosses all result 

in heads, what proportion of heads would you expect on the final 900 tosses? 

Comment on the statement “The strong law of large numbers swamps but 

does not compensate.” 

8.10. If X is a Poisson random variable with mean A, show that for i < A, 
P{x<i< eC) 


8.11.Let X be a binomial random variable with parameters n and p. Show that, 


for i > np, 
a. minimum e~“E[e**] occurs when t is such that e* = a Dp , where 
q=1-p. 
n” ; on 
b. P{xX >i} < impr? —p)”*. 


8.12.The Chernoff bound on a standard normal random variable Z gives 
Piz > a} < e~%*/2/q > 0. Show, by considering the density of Z, that the right 
side of the inequality can be reduced by the factor 2. That is, show that 


1 
P{Z>a}< se a>0 


8.13 Show that if E[X] < 0 and @ # 0 is such that E[e®*] = 1, then @ > 0. 
8.14 Let X,,X2,... be a sequence of independent and identically distributed 
random variables with distribution F, having a finite mean and variance. 
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n 
Whereas the central limit theorem states that the distribution of » 
i= 


approaches a normal distribution as n goes to infinity, it gives us no 
information about how large n need be before the normal becomes a good 
approximation. Whereas in most applications, the approximation yields good 
results whenever n = 20, and oftentimes for much smaller values of n, how 
large a value of n is needed depends on the distribution of X;. Give an 
example of a distribution F such that the distribution of X; is not close to 
a normal distribution. 

Hint: Think Poisson. 

8.15. If f and g are density functions that are positive over the same region, 
then the Kullback-Leiber divergence from density f to density g is defined by 


xX 
Ku(f.g) = E;lloe( 2) = | toe 2) poo 


where the notation E-[h(X)] is used to indicate that X has density function f. 
a. Show that KL(f,f) = 0 
f(x) g(x) 


b. Use Jensen’s inequality and the identity log( oy) = ore i 


a ), to 


show that KL(f,g) = 0 


8.16. Let L(p) be the Lorenz curve associated with the distribution function F, 
with density function f and mean uz. 
a. Show that 


if’ a 
= F “(y)dy 
0 


Sp 
1 
Hint: Starting with L(p) = :| xf (x)dx, make the change of variable 
0 


y = F(x). 
b. Use part (a) to show that L(p) is convex. 
c. Show that 


1 co 
1 
| L(p)dp = a | (1 — F(x) )xf (x)dx 
0 0 
d. Verify the preceding formula by using it to compute the Gini index of a 


uniform (0,1) and an exponential random variable, comparing your 
answers with those given in Example 7d 
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Self-Test Problems And Exercises 


8.1. The number of automobiles sold weekly at a certain dealership 
is a random variable with expected value 16. Give an upper bound to 
the probability that 

a. next week’s sales exceed 18; 

b. next week’s sales exceed 25. 


8.2. Suppose in Problem 8.14 _ that the variance of the number of 
automobiles sold weekly is 9. 
a. Give a lower bound to the probability that next week’s sales 
are between 10 and 22, inclusively. 
b. Give an upper bound to the probability that next week’s sales 


exceed 18. 
8.3. If 
E[X] = 75 E[Y]=75  Var(X) = 10 
Var(Y) = 12 Cov(X,Y) = —3 


give an upper bound to 
a. P{|X —Y| > 15}; 
b. P{X >Y +15}; 
c. P{Y > X +15}. 


8.4. Suppose that the number of units produced daily at factory A is a 
random variable with mean 20 and standard deviation 3 and the 
number produced at factory B is a random variable with mean 18 and 
standard deviation 6. Assuming independence, derive an upper 
bound for the probability that more units are produced today at 
factory B than at factory A. 

8.5. The amount of time that a certain type of component functions 


before failing is a random variable with probability density function 
f(x) =2x O0<x<1 


Once the component fails, it is immediately replaced by another one 
of the same type. If we let X; denote the lifetime of the ith component 


n 
to be put in use, then S,, = ». X; represents the time of the nth 
se 
failure. The long-term rate at which failures occur, call it 7, is defined 
by 
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iin. — 
re hi ae 
n — co Sy 


Assuming that the random variables X;, i => 1, are independent, 
determine r. 
8.6. In Self-Test Problem 8.5, how many components would one 
need to have on hand to be approximately 90 percent certain that the 
stock would last at least 35 days? 
8.7. The servicing of a machine requires two separate steps, with the 
time needed for the first step being an exponential random variable 
with mean .2 hour and the time for the second step being an 
independent exponential random variable with mean .3 hour. If a 
repair person has 20 machines to service, approximate the 
probability that all the work can be completed in 8 hours. 
8.8. On each bet, a gambler loses 1 with probability .7, loses 2 with 
probability .2, or wins 10 with probability .1. Approximate the 
probability that the gambler will be losing after his first 100 bets. 
8.9. Determine t so that the probability that the repair person in Self- 
Test Problem 8.7 _ finishes the 20 jobs within time t is 
approximately equal to .95. 
8.10. A tobacco company claims that the amount of nicotine in one of 
its cigarettes is a random variable with mean 2.2 mg and standard 
deviation .3 mg. However, the average nicotine content of 100 
randomly chosen cigarettes was 3.1 mg. Approximate the probability 
that the average would have been as high as or higher than 3.1 if the 
company’s claims were true. 
8.11. Each of the batteries in a collection of 40 batteries is equally 
likely to be either a type A or a type B battery. Type A batteries last 
for an amount of time that has mean 50 and standard deviation 15; 
type B batteries last for an amount of time that has mean 30 and 
standard deviation 6. 
a. Approximate the probability that the total life of all 40 batteries 
exceeds 1700. 
b. Suppose it is known that 20 of the batteries are type A and 20 
are type B. Now approximate the probability that the total life 
of all 40 batteries exceeds 1700. 


8.12. A clinic is equally likely to have 2, 3, or 4 doctors volunteer for 
service on a given day. No matter how many volunteer doctors there 
are on a given day, the numbers of patients seen by these doctors 
are independent Poisson random variables with mean 30. Let X 
denote the number of patients seen in the clinic on a given day. 


670 of 848 


a. Find E[X]. 

b. Find Var(X). 

c. Use a table of the standard normal probability distribution to 
approximate P{X > 65}. 


8.13. The strong law of large numbers states that with probability 1, 
the successive arithmetic averages of a sequence of independent 
and identically distributed random variables converge to their 
common mean yu. What do the successive geometric averages 
converge to? That is, what is 


1/n 
% / 


n—20©o 


i=1 


8.14. Each new book donated to a library must be processed. 
Suppose that the time it takes to process a book has mean 10 
minutes and standard deviation 3 minutes. If a librarian has 40 books 
to process, 
a. approximate the probability that it will take more than 420 
minutes to process all these books; 
b. approximate the probability that at least 25 books will be 
processed in the first 240 minutes. 
What assumptions have you made? 


8.15. Prove Chebyshev’s sum inequality, which says that if 
a, >a, >° >a, and b, > bz >::: => by, then 


"D2 (Y,9)(D,.2) 
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9.4 Coding Theory and Entropy 


9.1 The Poisson Process 


Before we define a Poisson process, let us recall that a function f is said to be o(h) if 


h 
lim fo) 2g 
hoo nh 


That is, f is o(h) if, for small values of h, f(h) is small even in relation to h. Suppose 
now that “events” are occurring at random points at time, and let N(t) denote the 
number of events that occur in the time interval [0, t ]. The collection of random 
variables {N(t), t = 0} is said to be a Poisson process having rate A, > 0, if 


i. N(O) = 0. 
ii. The numbers of events that occur in disjoint time intervals are independent. 
iii. The distribution of the number of events that occur in a given interval depends 
only on the length of that interval and not on its location. 
iv. P{N(h) = 1} = Ah+ o(h). 
Vv. PIN(h) = 2} = 0). 


Thus, condition (i) states that the process begins at time 0. Condition (ii), the 
independent increment assumption, states, for instance, that the number of events 
that occur by time t [that is, N(t)] is independent of the number of events that occur 
between t and t+ s [that is, N(t + s) — N(t)]. Condition (iii), the stationary increment 
assumption, states that the probability distribution of N(t + s) — N(t) is the same for 
all values of t. 


In Chapter 4 _, we presented an argument, based on the Poisson distribution being 
a limiting version of the binomial distribution, that the foregoing conditions imply that 
N(t) has a Poisson distribution with mean At. We will now obtain this result by a 
different method. 


Lemma 1.1 


For a Poisson process with rate A, 


P{N(t) = 0} = e~4# 


Proof Let Py(t) = P{N(t) = 0}. We derive a differential equation for P,(t) in the 
following manner: 


Po(t+h) = P{N(t+h) = 0} 
= P{N(t) =0,N(t+h)-—N(t) =0} 
= P{N(t) = O}P{N(t +h) — N(t) = 0} 
= P,(t)[1—Ah+ o(h)] 


where the final two equations follow from condition (ii) plus the fact that 
conditions (iv) and (v) imply that P{N(h) = 0} = 1—Ah+ o(h). Hence, 


Po(t +h) — Po(t) _ o(h) 
men) Sasa 


Now, letting h — 0, we obtain 
P'5(t) = —APo(t) 


or, equivalently, 


Pio(t)_ 
Pot) 


which implies, by integration, that 


logP)(t) = —Att+c 


or 


Pit) = Ke 


Since P,(0) = P{N(0) = 0} = 1, we arrive at 


Po(t)=e™* 


For a Poisson process, let T, denote the time the first event occurs. Further, for 

n > 1, let T,, denote the time elapsed between the (n — 1) and the nth event. The 
sequence {T,,,n = 1, 2,...} is called the sequence of interarrival times. For instance, if 
T, = 5 and T, = 10, then the first event of the Poisson process would have occurred 
at time 5 and the second at time 15. 


We shall now determine the distribution of the T,,. To do so, we first note that the 
event {T, > t} takes place if and only if no events of the Poisson process occur in 
the interval [0, t]; thus, 


P{T, > t} = P{N(t) = 0} =e"** 
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Hence, T, has an exponential distribution with mean 1/A. Now, 
P{T, >t} =E|P{T, >t|T,}] 
However, 


Pit. Stl. =s} P{O eventsin (s, s + t]|T, = s} 


P{0 events in (s, s + t]} 


— e-it 


where the last two equations followed from the assumptions about independent and 
stationary increments. From the preceding, we conclude that T, is also an 
exponential random variable with mean 1/A and, furthermore, that T, is independent 
of T, . Repeating the same argument yields Proposition 1.1 


Proposition 1.1 


T1, T2,... are independent exponential random variables, each with mean 1/2. 


Another quantity of interest is S,,, the arrival time of the nth event, also called the 
waiting time until the nth event. It is easily seen that 


hence, from Proposition 1.1 and the results of Section 5.6.1 __, it follows 
that S,, has a gamma distribution with parameters n and A. That is, the probability 
density of S,, is given by 


re il 
: Ai = 


fg, (0) =e 


We are now ready to prove that N(t) is a Poisson random variable with mean At. 


Theorem 1.1 


For a Poisson process with rate A, 


a e “Gay” 

P{N(t) = n} = — — 

Proof Note that the nth event of the Poisson process will occur before or at time 
t if and only if the number of events that occur by t is at least n. That is, 
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N(t)heneS,<t 


so 


P{N(t)=n} = P{N(t) =>n}— P{N(t) >n+1} 


P{S, <t}—P{Spi. <t} 


But the integration-by-parts formula [ dv = uv — [> du with u = e-* and 


dv =A[(aAx)" */(n— 1)!] dx yields 


t t 
re ** ve) dx = oat At) + nex A dx 
(n—1)! n! n! 
0 0 


which completes the proof. 


9.2 Markov Chains 


Consider a sequence of random variables Xo, X,,..., and suppose that the set of 
possible values of these random variables is {0, 1,..., M}. It will be helpful to interpret 
X,, as being the state of some system at time n, and, in accordance with this 
interpretation, we say that the system is in state i at time n if X, =i. The sequence 
of random variables is said to form a Markov chain if, each time the system is in state 
i, there is some fixed probability—call it P;;—that the system will next be in state j. 
That is, for all ig, ..., in. J, 


P{Xn41 = ihn = 1, Xp—4 = ty-1 0 Xy = 1, X0 = ig} = Py 


The values P;;,0 <i< M,0 <j <N, are called the transition probabilities of the 


Markov chain, and they satisfy 


(Why?) It is convenient to arrange the transition probabilities P;; in a square array as 


675 of 848 


follows: 


Poo Po. Pom 
Pio Pu Pim 
Pwo Pua Pum 


Such an array is called a matrix. 


Knowledge of the transition probability matrix and of the distribution of X) enables us, 
in theory, to compute all probabilities of interest. For instance, the joint probability 
mass function of Xo,..., Xn is given by 


PX Sy Koy hc ip ey ha = ie A= ot 
= PIX, = in|Xn-1 = In-1, +» Xo = gdh X24 = in-1, -» Xo = iat 


=r; P{Xn-1 = in-1 + Xo = ig} 


n-1:in 
and continual repetition of this argument demonstrates that the preceding is equal to 


Pie ote Pantene EP Xo = io} 


Example 2a 


Suppose that whether it rains tomorrow depends on previous weather conditions 
only through whether it is raining today. Suppose further that if it is raining today, 
then it will rain tomorrow with probability a, and if it is not raining today, then it will 
rain tomorrow with probability ~. 


If we say that the system is in state 0 when it rains and state 1 when it does not, 
then the preceding system is a two-state Markov chain having transition 
probability matrix 


ail-a 


B 1-8 


That is, P99 = @=1— Po, Pip = B=1— Py. 
Example 2b 


Consider a gambler who either wins 1 unit with probability p or loses 1 unit with 
probability 1 — p at each play of the game. If we suppose that the gambler will 
quit playing when his fortune hits either 0 or M, then the gambler’s sequence of 
fortunes is a Markov chain having transition probabilities 


Peay = p=1-P,; ji-1 i=1,.., M—-1 
Poo = Pum =1 


Example 2c 


The husband-and-wife physicists Paul and Tatyana Ehrenfest considered a 
conceptual model for the movement of molecules in which M molecules are 
distributed among 2 urns. At each time point, one of the molecules is chosen at 
random and is removed from its urn and placed in the other one. If we let X,, 
denote the number of molecules in the first urn immediately after the nth 
exchange, then {Xo, X;,...} is a Markov chain with transition probabilities 


_M-t 


Thus, for a Markov chain, P;; represents the probability that a system in state i will 
enter state j at the next transition. We can also define the two-stage transition 


probability Py that a system presently in state i will be in state 7 after two additional 


transitions. That is, 
PD = Pl Xiaay =| Xu =a} 
The Py can be computed from the P;; as follows: 
Pi = PX, = j|Xo =i) 


P(X, = j,X1 =k|X =} 


P{X, = j|X1 =k, Xo = YP{X1 =k|Xp =} 


[47s [M47 1[M I= 


In general, we define the n-stage transition probabilities, denoted as Pr by 
Pe SP =I a 
ij Kien =a 


Proposition 2.1, known as the Chapman—Kolmogorov equations, shows how the 
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can be computed. 


Proposition 2.1 The Chapman—Kolmogorov Equations 


M 
Po = > PPE” foralO<r<n 
k =0 


Proof 
PO = PRG =F Moan 


=) Pi =i X= kX =i) 
k 


=) Pika = s1Xp = by Xo = BPX, = k [Xo = 3 
k 

= DP Pw 
k 


Example 2d A random walk 


An example of a Markov chain having a countably infinite state space is the 
random walk, which tracks a particle as it moves along a one-dimensional axis. 
Suppose that at each point in time, the particle will move either one step to the 
right or one step to the left with respective probabilities p and 1 — p. That is, 
suppose the particle’s path follows a Markov chain with transition probabilities 


Pitta =P =1-Pij-1 i=0,+1.... 
If the particle is at state i, then the probability that it will be at state j after n 
transitions is the probability that (n — i + j)/2 of these steps are to the right and 
n—|[(n—i+ j)/2| = (n+i-—j)/2 are to the left. Since each step will be to the 


right, independently of the other steps, with probability p, it follows that the 
preceding is just the binomial probability 


n eer 
prs (n-it+j)/2¢4 — (n+i-j)/2 


n 
where ( is taken to equal 0 when x is not a nonnegative integer less than or 
x 


equal to n. The preceding formula can be rewritten as 
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2n+1 _ 
oud = ( \ aaa —p)"* 


k=0,+1,..,4n,-—(n+1) 


Although the Pe? denote conditional probabilities, we can use them to derive 
expressions for unconditional probabilities by conditioning on the initial state. For 
instance, 


PX, =i} =) Pi =i1Xo = OP = 3 


U 
=) PPP =3 
i 


For a large number of Markov chains, it turns out that Py 


value 7; that depends only on j. That is, for large values of n, the probability of being 
in state j after n transitions is approximately equal to z;, no matter what the initial 


converges, as n > ©, toa 


state was. It can be shown that a sufficient condition for a Markov chain to possess 
this property is that for some n > 0, 


(2.1) 

Py > 0. forallé, j= 0,1)..,M 
Markov chains that satisfy Equation (2.1) are said to be ergodic. Since 
Proposition 2.1 yields 


M 


(n+1) _ (n) 
Py = > Pix’ Pxj 
k =0 


it follows, by letting n — oo, that for ergodic chains, 


(2.2) 
M 
tj = > TEP ej 
k =0 
M 
Furthermore, since 1 = > PY”, we also obtain, by letting n — ©9, 
j=0 
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(2.3) 


In fact, it can be shown that the 7,;,0 < j < M, are the unique nonnegative solutions 
of Equations (2.2) and(2.3 _ ). All this is summed up in Theorem 2.1 __, which 
we state without proof. 


Theorem 2.1 


For an ergodic Markov chain, 


a;= lim P; 


(n) 
J n—-o Yy 


exists, and the 7;,0 < j < M, are the unique nonnegative solutions of 


M 


= ». Pj 


TU 


Example 2e 


Consider Example 2a ___, in which we assume that if it rains today, then it will 
rain tomorrow with probability a, and if it does not rain today, then it will rain 
tomorrow with probability 8. From Theorem 2.1 __, it follows that the limiting 
probabilities 7, and 2, of rain and of no rain, respectively, are given by 


To =aANM) + Pr, 
Mm =(1—a@)m) + (1—-B)m, 
To +l, =1 
which yields 
_ B _ 1i-@ 
ee er 1 THR =a 


For instance, if a = .6 and 6 = .3, then the limiting probability of rain on the nth 
day is my) = - 


The quantity zz; is also equal to the long-run proportion of time that the Markov chain 
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is in state j, 7 = 0,..,M. To see intuitively why this might be so, let P; denote the 
long-run proportion of time the chain is in state j. (It can be proven using the strong 
law of large numbers that for an ergodic chain, such long-run proportions exist and 
are constants.) Now, since the proportion of time the chain is in state k is P,, and 
since, when in state k, the chain goes to state j with probability P,;, it follows that the 
proportion of time the Markov chain is entering state j from state k is equal to P;,Px;. 
Summing over all k shows that P;, the proportion of time the Markov chain is 


P; = > PP 
k 


entering state j, satisfies 


Since clearly it is also true that 


it thus follows, since by Theorem 2.1 the z;, j = 0,...,M are the unique solution of 
the preceding, that P; = 7;, j = 0,..,M. The long-run proportion interpretation of 7; 
is generally valid even when the chain is not ergodic. 


Example 2f 


Suppose in Example 2c _ that we are interested in the proportion of time that 
there are j molecules in urn 1, j = 0,..,M. By Theorem 2.1 __, these quantities 
will be the unique solution of 


satisfy the preceding equations, it follows that these are the long-run proportions 


681 of 848 


of time that the Markov chain is in each of the states. (See Problem 9.11 __ for 
an explanation of how one might have guessed at the foregoing solution.) 


9.3 Surprise, Uncertainty, and Entropy 


Consider an event E that can occur when an experiment is performed. How 
surprised would we be to hear that E does, in fact, occur? It seems reasonable to 
suppose that the amount of surprise engendered by the information that E has 
occurred should depend on the probability of FE. For instance, if the experiment 
consists of rolling a pair of dice, then we would not be too surprised to hear that FE 
has occurred when E represents the event that the sum of the dice is even (and thus 


1 
has probability 3): whereas we would certainly be more surprised to hear that E has 


occurred when E is the event that the sum of the dice is 12 (and thus has probability 
1 
36) 

In this section, we attempt to quantify the concept of surprise. To begin, let us agree 
to suppose that the surprise one feels upon learning that an event E has occurred 
depends only on the probability of EF, and let us denote by S(p) the surprise evoked 
by the occurrence of an event having probability p. We determine the functional form 
of S(p) by first agreeing on a set of reasonable conditions that S(p) should satisfy 
and then proving that these axioms require that S(p) have a specified form. We 
assume throughout that S(p) is defined for all 0 < p < 1 but is not defined for events 
having p = 0. 


Our first condition is just a statement of the intuitive fact that there is no surprise in 
hearing that an event that is sure to occur has indeed occurred. 


Axiom 1 
S(1) =0 
Our second condition states that the more unlikely an event is to occur, the greater is 
the surprise evoked by its occurrence. 
Axiom 2 
S(p) is a strictly decreasing function of p; that is, if p < q, then S(p) > S(q). 


The third condition is a mathematical statement of the fact that we would intuitively 
expect a small change in p to correspond to a small change in S(p). 


Axiom 3 
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S(p) is a continuous function of p. 


To motivate the final condition, consider two independent events E and F having 
respective probabilities P(E) = p and P(F) = q. Since P(EF) = pq, the surprise 
evoked by the information that both EF and F have occurred is S(pq). Now, suppose 
that we are told first that E has occurred and then, afterward, that F has also 
occurred. Since S(p) is the surprise evoked by the occurrence of E, it follows that 
S(pq) — S(p) represents the additional surprise evoked when we are informed that F 
has also occurred. However, because F is independent of E,, the knowledge that E 
occurred does not change the probability of F; hence, the additional surprise should 
just be S(q). This reasoning suggests the final condition. 


Axiom 4 
S(pq) = S(p) + S(q) V<psd, U<g=a1 


We are now ready for Theorem 3.1 __, which yields the structure of S(p). 
Theorem 3.1 
If S(- ) satisfies Axioms 1 through 4 ___, then 
S(p) = — Clog, p 
where C is an arbitrary positive integer. 
Proof It follows from Axiom 4 _ that 
S(p*) = S(p) + S(p) = 25(p) 
and by induction that 


(3.1) 
S(p™) = mS(p) 


Also, since, for any integral n, S(p) = S(p+/"---p1/") = nS(p1/,, it follows that 
(3.2) 


1 
S(pil”) = | 5() 


Thus, from Equations (3.1) and(3.2) —, we obtain 


Spm) =msS@1/") 


re (p) 


which is equivalent to 


(3.3) 
S(p*) = xS(p) 


whenever x is a positive rational number. But by the continuity of S(Axiom 3 __), 
it follows that Equation (3.3) is valid for all nonnegative x. (Reason this out.) 


x 
Now, for any p,0<p <1, letx = —log, p. Then p= (3) , and from Equation 
(3.3), 


1 


sp) = 5((Z) ) = x8(Z) = ~ Clog, » 
where C = s(5) > 5(1) = 0 by Axioms 2 and 1 


It is usual to let C equal 1, in which case the surprise is said to be expressed in units 
of bits (short for binary digits). 


Next, consider a random variable X that must take on one of the values xj, ..., x, with 
respective probabilities p,,...,p,,. Since —log p, represents the surprise evoked if X 
takes on the value x;," it follows that the expected amount of surprise we shall 
receive upon learning the value of X is given by 


n 


(x)= = ». Dp; logp; 


i=1 


For the remainder of this chapter, we write log x for log, x. Also, we 


use In x for log, x. 


The quantity H(X) is known in information theory as the entropy of the random 
variable X. (In case one of the p, = 0, we take 0 log 0 to equal 0.) It can be shown 
(and we leave it as an exercise) that H(X) is maximized when all of the p, are equal. 
(Is this intuitive?) 


683 of 848 


684 of 848 


Since H(X) represents the average amount of surprise one receives upon learning 
the value of X, it can also be interpreted as representing the amount of uncertainty 
that exists as to the value of X. In fact, in information theory, H(X) is interpreted as 
the average amount of information received when the value of X is observed. Thus, 
the average surprise evoked by X, the uncertainty of X, or the average amount of 
information yielded by X all represent the same concept viewed from three slightly 
different points of view. 


Now consider two random variables X and Y that take on the respective values 
X4,+,Xpy and y,,...,Y,, with joint mass function 


P(X ¥;) = P{x = x, Y= y,} 
It follows that the uncertainty as to the value of the random vector (X, Y), denoted by 


H(X,Y), is given by 


L 


H&Y)=-) ) pony) logpeny,) 
j 


Suppose now that Y is observed to equal Yj. In this situation, the amount of 
uncertainty remaining in X is given by 


Hyay,(X) = — ) pGilylogpGely,) 


where 
p(xily,) = PIX =xi|Y=y,} 


Hence, the average amount of uncertainty that will remain in X after Y is observed is 
given by 


Hy(X) =) Hyay, py) 
j 


where 
Py(V;) = ply = y;} 
Proposition 3.1 relates H(X, Y) to H(Y) and Hy(X). It states that the uncertainty 


as to the value of X and Y is equal to the uncertainty of Y plus the average 
uncertainty remaining in X when Y is to be observed. 
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H(X, Y) = H(Y) + Hy(X) 


Proposition 3.1 


Proof Using the identity p(x;, y;) = Py(Y P(X ly,) yields 


H(X, Y) 


L 


- > P(x, Y;) logp(xi y,) 
i 


L 


= oe Py(y )p(xily;) [log py (v;) + log p(ily,)] 
j 


— > py) logpy) > Pauly) 
J i 


- > py0)> paily,) loge ly,) 
J i 


H(Y) + Hy(X) 


It is a fundamental result in information theory that the amount of uncertainty in a 
random variable X will, on the average, decrease when a second random variable Y 
is observed. Before proving this statement, we need the following lemma, whose 
proof is left as an exercise. 


Lemma 3.1 


Inx<x-1 x>0 
with equality only at x = 1. 


Theorem 3.2 
Hy(X) < H(X) 


with equality if and only if X and Y are independent. 


Proof 


686 of 848 


Hy(X)- HC) = D 2,0: 1y,) logle lye) + ) 2, y,) log p(x) 


= 2, 2,0 Iv) log) ee we. 5| 


< ee Y powy,| eM 1| by Lemma 3.1 
ij 


P(x Vi) 
= loge » > oP) = > > Peny,) 
= loge|1 — ; | 
= 0 


9.4 Coding Theory and Entropy 


Suppose that the value of a discrete random vector X is to be observed at location A 
and then transmitted to location B via a communication network that consists of two 
signals, O and 1. In order to do this, it is first necessary to encode each possible 
value of X in terms of a sequence of 0’s and 1’s. To avoid any ambiguity, it is usually 
required that no encoded sequence can be obtained from a shorter encoded 
sequence by adding more terms to the shorter. 


For instance, if ¥ can take on four possible values x, x2, x3, and x4, then one 
possible coding would be 


(4.1) 
x, <©00 
xX, ©— O1 
x3 © 10 
x, @11 


That is, if ¥ = x,, then the message 00 is sent to location B, whereas if X = x,, then 
01 is sent to B, and so on. A second possible coding is 


(4.2) 
x, —0 
xX, —©10 
x3 ©4110 
x, 111 
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However, a coding such as 


x, © 0 
xX, © 
x3 © 00 
x, © 01 


is not allowed because the coded sequences for x; and x, are both extensions of the 
one for x, . 


One of the objectives in devising a code is to minimize the expected number of bits 
(that is, binary digits) that need to be sent from location A to location B. For example, 
if 


PW =x}=5 
PX =m} =7 
PIX =x) =5 
P&=x}= 5 


then the code given by Equation (4.2) | would expect to send 

5(1) + ;(2) + 5(3) + 5(3) = 1.75 bits, whereas the code given by Equation (4.1) 
would expect to send 2 bits. Hence, for the preceding set of probabilities, the 
encoding in Equation (4.2) is more efficient than that in Equation (4.1) 


The preceding discussion raises the following question: For a given random vector X, 
what is the maximum efficiency achievable by an encoding scheme? The answer is 
that for any coding, the average number of bits that will be sent is at least as large as 
the entropy of X. To prove this result, known in information theory as the noiseless 
coding theorem, we shall need Lemma 4.1 


Lemma 4.1 


Let X take on the possible values x,,...,x,. Then, in order to be able to encode 
the values of X in binary sequences (none of which is an extension of another) of 
respective lengths nj,...,n,, it is necessary and sufficient that 


Proof For a fixed set of N positive integers n,,..., ny, let w; denote the number of 
the n; that are equal to j, j = 1,... For there to be a coding that assigns n; bits to 
the value x;,i = 1,.., N, itis clearly necessary that w, < 2. Furthermore, 
because no binary sequence is allowed to be an extension of any other, we must 
have w2 < 2? — 2w,. (This follows because 2? is the number of binary 
sequences of length 2, whereas 2w, is the number of sequences that are 
extensions of the w, binary sequence of length 1.) In general, the same 
reasoning shows that we must have 


(4.3) 


Wy <2" —w 2" + —w.2™ * — ++: —Wy-12 


for n = 1,.... In fact, a little thought should convince the reader that these 
conditions are not only necessary, but also sufficient for a code to exist that 
assigns n; bits to x;,i= 1,..,N. 


Rewriting inequality (4.3) as 


Wy t+ Wy—-12 + Wi 2” +e twy2" *<2" n=1,.. 


and dividing by 2” yields the necessary and sufficient conditions, namely, 


(4.4) 


. w,(2) < 1 foralln 


j 
However, because > w,(3) is increasing in n, it follows that Equation 
jsi 


(4.4) — will be true if and only if 
>, wig) <1 
Jo= At 


The result is now established, since, by the definition of w; as the number of n; 
that equal /, it follows that 


We are now ready to prove Theorem 4.1 
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Theorem 4.1 The noiseless coding theorem 


Let X take on the values x,,..., x, with respective probabilities p(x,), ... 


Then, for any coding of X that assigns n; bits to x;, 


N 


N 
>, mp) 2HOD = - >) peedlog pe) 


i=1 i=1 


Hence, 
N N 
— be Progr), = > P;logq, 
me F t= 
N N 
— be n,;P; + log ». 2.04 
i=1 jui 
N 
< > niP; by Lemma 4.1 
i=1 
Example 4a 


Consider a random variable X with probability mass function 


PQa)= 5 PO2)=F pls) =p) =5 


Since 


»P(xy)- 


1.2.3 
2°44 
=475 


it follows from Theorem 4.1 __ that there is no more efficient coding scheme 
than 


x, 0 
x2 <- 10 
x3 110 
x, 111 


For most random vectors, there does not exist a coding for which the average 
number of bits sent attains the lower bound H(X). However, it is always possible to 
devise a code such that the average number of bits is within 1 of H(X). To prove 
this, define n; to be the integer satisfying 


—log p(x;) < nj < —logp(x;)+1 


Now, 
N 

glogp(xi) — >, n(x; =1 
i=1 


so, byLemma4.1__, we can associate sequences of bits having lengths n; with the 
x;,t = 1,..,N. The average length of such a sequence, 


N 
L= > Nn; P(x;) 


i=1 


satisfies 


N 
p(xi)logp(xi) <L< - p(x) logpe) +1 
i=1 


[42 


or 


H(X) <L<H(X)+1 
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Example 4b 


Suppose that 10 independent tosses of a coin having probability p of coming up 
heads are made at location A and the result is to be transmitted to location B. 
The outcome of this experiment is a random vector X = (Xj,..., X49), where X; is 
1 or 0 according to whether or not the outcome of the i th toss is heads. By the 
results of this section, it follows that L, the average number of bits transmitted by 
any code, satisfies 


H(X) <L 


with 


L<H(X)+1 


for at least one code. Now, since the X; are independent, it follows from 
Proposition 3.1 and Theorem 3.2 _ that 


AO =x. < ». H(X,) 


— 10[p log p + (1 — p) log(1 — p)] 


lfp = > then H(X) = 10, and it follows that we can do no better than just 


encoding X by its actual value. For example, if the first 5 tosses come up heads 
and the last 5 tails, then the message 1111100000 is transmitted to location B. 


However, if p # > we can often do better by using a different coding scheme. For 


instance, if p = > then 
1 1 3 3 
H(X) = —10(jlog; + $log+) = 8.11 


Thus, there is an encoding for which the average length of the encoded message 
is no greater than 9.11. 


One simple coding that is more efficient in this case than the identity code is to 
break up (Xj,..., X49) into 5 pairs of 2 random variables each and then, for 
i = 1,3,5, 7,9, code each of the pairs as follows: 


X; =0,Xj4, =000 
X; =0,Xj4, =10 10 

X; =1,Xj4, =00 110 
X,=1,Xj4, =10111 
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The total message transmitted is the successive encodings of the preceding 
pairs. 


For instance, if the outcome TTTHHTTTTH is observed, then the message 
010110010 is sent. The average number of bits needed to transmit the message 


1G) HQ) a) | =e 


= 8.44 


Up to this point, we have assumed that the message sent at location A is received 
without error at location B. However, there are always certain errors that can occur 
because of random disturbances along the communications channel. Such random 
disturbances might lead, for example, to the message 00101101, sent at A, being 
received at B in the form 01101101. 


Let us suppose that a bit transmitted at location A will be correctly received at 
location B with probability p, independently from bit to bit. Such a communications 
system is called a binary symmetric channel. Suppose further that p = .8 and we 
want to transmit a message consisting of a large number of bits from A to B. Thus, 
direct transmission of the message will result in an error probability of .20 for each 
bit, which is quite high. One way to reduce this probability of bit error would be to 
transmit each bit 3 times and then decode by majority rule. That is, we could use the 
following scheme: 


Encode Decode Encode Decode 
0 > 000 000 1-111 111 
001 110 
-0 =A 
010 101 
100 011 


Note that if no more than one error occurs in transmission, then the bit will be 
correctly decoded. Hence, the probability of bit error is reduced to 


(.2)? + 3(.2)7(.8) = .104 


a considerable improvement. In fact, it is clear that we can make the probability of bit 
error as small as we want by repeating the bit many times and then decoding by 
majority rule. For instance, the scheme 
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Encode Decode 


0 - string of 17 0’s By majority rule 


1 > string of 17 1’s 


will reduce the probability of bit error to below .01. 


The problem with this type of encoding scheme is that although it decreases the 
probability of bit error, it does so at the cost of also decreasing the effective rate of 
bits sent per signal. (See Table 9.1.) 


Table 9.1 Repetition of Bits Encoding Scheme. 


Probability of error (per bit) Rate (bits transmitted per signal) 


.20 1 


10 _l 
23(-2) 


01 1 
06( = 7) 


In fact, at this point it may appear inevitable to the reader that decreasing the 
probability of bit error to 0 always results in also decreasing the effective rate at 
which bits are transmitted per signal to 0. However, a remarkable result of 
information theory known as the noisy coding theorem and due to Claude Shannon 
demonstrates that this is not the case. We now state this result as Theorem 4.2 


Theorem 4.2 The noisy coding theorem 


There is a number C such that for any value R that is less than C, and for any 

€ > 0, there exists a coding—decoding scheme that transmits at an average rate 
of R bits sent per signal and with an error (per bit) probability of less than ¢. The 
largest such value of C— call it C’t —is called the channel capacity, and for the 
binary symmetric channel, 


TFor an entropy interpretation of C*, see Theoretical Exercise 9.18. 


C’=1+ plogp + (1 — p) log(1 — p) 


Summary 


The Poisson process having rate A is a collection of random variables {N(t), t > 0} 
that relate to an underlying process of randomly occurring events. For instance, N(t) 
represents the number of events that occur between times 0 and t. The defining 
features of the Poisson process are as follows: 


i. The number of events that occur in disjoint time intervals are independent. 
ii. The distribution of the number of events that occur in an interval depends only 
on the length of the interval. 
ili. Events occur one at a time. 
iv. Events occur at rate A. 


It can be shown that N(t) is a Poisson random variable with mean At. In addition, if 
T;,i = 1, are the times between the successive events, then they are independent 
exponential random variables with rate A. 


A sequence of random variables X,,,n = 0, each of which takes on one of the values 
0,...,M, is said to be a Markov chain with transition probabilities P; ; if, for all 
Nig, rly t J, 


PX gia = Xx = 14,Xp-1 = tn-1 Xo = ig} =P iy 


If we interpret X, as the state of some process at time n, then a Markov chain is a 
sequence of successive states of a process that has the property that whenever it 
enters state i, then, independently of all past states, the next state is j with 
probability P; ;, for all states i and j. For many Markov chains, the probability of 
being in state j at time n converges to a limiting value that does not depend on the 
initial state. If we let z;, 7 = 0,.., M, denote these limiting probabilities, then they are 
the unique solution of the equations 


Moreover, 7c; is equal to the long-run proportion of time that the chain is in state j. 


Let X be a random variable that takes on one of n possible values according to the 
set of probabilities {p,,...,p,,}. The quantity 
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HX) = - > p,log,@) 
i=1 


is called the entropy of X. It can be interpreted as representing either the average 
amount of uncertainty that exists regarding the value of X or the average information 
received when X is observed. Entropy has important implications for binary codings 


Problems and Theoretical Exercises 


9.1. Customers arrive at a bank at a Poisson rate 4. Suppose that two 
customers arrived during the first hour. What is the probability that 

a. both arrived during the first 20 minutes? 

b. at least one arrived during the first 20 minutes? 


9.2. Cars cross a certain point in the highway in accordance with a 
Poisson process with rate A = 3 per minute. If Al runs blindly across the 
highway, what is the probability that he will be uninjured if the amount 
of time that it takes him to cross the road is s seconds? (Assume that if 
he is on the highway when a car passes by, then he will be injured.) Do 
this exercise for s = 2,5, 10, 20. 

9.3. Suppose thatin Problem 9.2 __, Al is agile enough to escape 
from a single car, but if he encounters two or more cars while 
attempting to cross the road, then he is injured. What is the probability 
that he will be unhurt if it takes him s seconds to cross? Do this 
exercise for s = 5, 10, 20, 30. 

9.4. Suppose that 3 white and 3 black balls are distributed in two urns 
in such a way that each urn contains 3 balls. We say that the system is 
in state i if the first urn contains i white balls, i = 0,1,2,3. At each stage, 
1 ball is drawn from each urn and the ball drawn from the first urn is 
placed in the second, and conversely with the ball from the second urn. 
Let X,, denote the state of the system after the nth stage, and compute 
the transition probabilities of the Markov chain {X,,,n = 0}. 

9.5. Consider Example 2a __. If there is a 50-50 chance of rain today, 
compute the probability that it will rain 3 days from now if a = .7 and 


P= a: 
9.6. Compute the limiting probabilities for the model of Problem 9.4 


9.7. A transition probability matrix is said to be doubly stochastic if 
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for all states j = 0,1,...,M. Show that such a Markov chain is ergodic, 


then | | =1/(M+1),j=0,1,..,M. 
j 


9.8. On any given day, Buffy is either cheerful (c), so-so (s), or gloomy 
(g). If she is cheerful today, then she will be c, s, or g tomorrow with 
respective probabilities .7, .2, and .1. If she is so-so today, then she will 
be c, s, or g tomorrow with respective probabilities .4, .3, and .3. If she 
is gloomy today, then Buffy will be c, s, or g tomorrow with probabilities 
.2, .4, and .4. What proportion of time is Buffy cheerful? 
9.9. Suppose that whether it rains tomorrow depends on past weather 
conditions only through the past 2 days. Specifically, suppose that if it 
has rained yesterday and today, then it will rain tomorrow with 
probability .8; if it rained yesterday but not today, then it will rain 
tomorrow with probability .3; if it rained today but not yesterday, then it 
will rain tomorrow with probability .4; and if it has not rained either 
yesterday or today, then it will rain tomorrow with probability .2. What 
proportion of days does it rain? 
9.10. A certain person goes for a run each morning. When he leaves 
his house for his run, he is equally likely to go out either the front or the 
back door, and similarly, when he returns, he is equally likely to go to 
either the front or the back door. The runner owns 5 pairs of running 
shoes, which he takes off after the run at whichever door he happens 
to be. If there are no shoes at the door from which he leaves to go 
running, he runs barefooted. We are interested in determining the 
proportion of time that he runs barefooted. 

a. Set this problem up as a Markov chain. Give the states and the 

transition probabilities. 
b. Determine the proportion of days that he runs barefooted. 


9.11. This problem refers to Example 2f 
a. Verify that the proposed value of | | satisfies the necessary 
j 


equations. 

b. For any given molecule, what do you think is the (limiting) 
probability that it is in urn 1? 

c. Do you think that the events that molecule j, j = 1, is in urn 1 at 
a very large time would be (in the limit) independent? 

d. Explain why the limiting probabilities are as given. 
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9.12. Determine the entropy of the sum that is obtained when a pair of 
fair dice is rolled. 

9.13. Prove that if X¥ can take on any of n possible values with 
respective probabilities P,,...,P,, then H(X) is maximized when 

P; =1/n,i=1,...,n. What is H(X) equal to in this case? 

9.14. A pair of fair dice is rolled. Let 

7 to if the sum of the dice is 6 


0 otherwise 


and let Y equal the value of the first die. Compute (a) H(Y), (b) Hy(X), 
and (c) H(X, Y). 

9.15. A coin having probability p = ; of coming up heads is flipped 6 
times. Compute the entropy of the outcome of this experiment. 

9.16. A random variable can take on any of n possible values x,,..., Xn 
with respective probabilities p(x;),i = 1,..,n. We shall attempt to 
determine the value of X by asking a series of questions, each of which 
can be answered “yes” or “no.” For instance, we may ask “Is X = x,?” 
or “Is X equal to either x, or x, or x3?” and so on. What can you say 
about the average number of such questions that you will need to ask 
to determine the value of X? 

9.17. Show that for any discrete random variable X and function f, 


H(f(X)) s H(X) 


9.18. In transmitting a bit from location A to location B, if we let X 
denote the value of the bit sent at location A and Y denote the value 
received at location B, then H(X) — Hy(X) is called the rate of 
transmission of information from A to B. The maximal rate of 
transmission, as a function of P{X = 1} = 1 — P{X = 0}, is called the 
channel capacity. Show that for a binary symmetric channel with 

P{Y =1|X =1} = P{Y = 0|X = 0} = p, the channel capacity is attained 
by the rate of transmission of information when P{x = 1} = ; and its 


value is 1+ plogp + (1 — p) log(1 — p). 


Self-Test Problems and Exercises 


9.1. Events occur according to a Poisson process with rate A = 3 per 
hour. 
a. What is the probability that no events occur between times 8 
and 10 in the morning? 
b. What is the expected value of the number of events that occur 
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between times 8 and 10 in the morning? 
c. What is the expected time of occurrence of the fifth event after 
2P.M.? 


9.2. Customers arrive at a certain retail establishment according to a 
Poisson process with rate A per hour. Suppose that two customers 
arrive during the first hour. Find the probability that 

a. both arrived in the first 20 minutes; 

b. at least one arrived in the first 30 minutes. 


9.3. Four out of every five trucks on the road are followed by a car, 
while one out of every six cars is followed by a truck. What proportion 
of vehicles on the road are trucks? 
9.4. A certain town’s weather is classified each day as being rainy, 
sunny, or overcast, but dry. If it is rainy one day, then it is equally 
likely to be either sunny or overcast the following day. If it is not rainy, 
then there is one chance in three that the weather will persist in 
whatever state it is in for another day, and if it does change, then it is 
equally likely to become either of the other two states. In the long 
run, what proportion of days are sunny? What proportion are rainy? 
9.5. Let X be a random variable that takes on 5 possible values with 
respective probabilities .35, .2, .2, .2, and .05. Also, let Y bea 
random variable that takes on 5 possible values with respective 
probabilities .05, .35, .1, .15, and .35. 

a. Show that H(X) > H(Y). 

b. Using the result of Problem 9.13 , give an intuitive 

explanation for the preceding inequality. 
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Chapter 10 Simulation 


Contents 


10.1 Introduction 
10.2 General Techniques for Simulating Continuous Random Variables 
10.3 Simulating from Discrete Distributions 


10.4 Variance Reduction Techniques 


10.1 Introduction 


How can we determine the probability of our winning a game of solitaire? (By 
solitaire, we mean any one of the standard solitaire games played with an ordinary 
deck of 52 playing cards and with some fixed playing strategy.) One possible 
approach is to start with the reasonable hypothesis that all (52)! possible 
arrangements of the deck of cards are equally likely to occur and then attempt to 
determine how many of these lead to a win. Unfortunately, there does not appear to 
be any systematic method for determining the number of arrangements that lead to a 
win, and as (52)! is a rather large number and the only way to determine whether a 
particular arrangement leads to a win seems to be by playing the game out, it can be 
seen that this approach will not work. 


In fact, it might appear that the determination of the probability of winning at solitaire 
is mathematically intractable. However, all is not lost, for probability falls not only 
within the realm of mathematics, but also within the realm of applied science; and, as 
in all applied sciences, experimentation is a valuable technique. For our solitaire 
example, experimentation takes the form of playing a large number of such games 
or, better yet, programming a computer to do so. After playing, say, n games, if we let 


a 1 if the ith game results ina win 
‘~~ (0 otherwise 
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then X;,i = 1,...,n will be independent Bernoulli random variables for which 


E[X;] = P{win at solitaire} 
Hence, by the strong law of large numbers, we know that 


Xi number of games won 
n number of games played 


[4a 


will, with probability 1, converge to P{win at solitaire} That is, by playing a large 
number of games, we can use the proportion of games won as an estimate of the 
probability of winning. This method of empirically determining probabilities by means 
of experimentation is known as simulation. 


In order to use a computer to initiate a simulation study, we must be able to generate 
the value of a uniform (0, 1) random variable; such variates are called random 
numbers. To generate them, most computers have a built-in subroutine, called a 
random-number generator, whose output is a sequence of pseudorandom 
numbers—a sequence of numbers that is, for all practical purposes, indistinguishable 
from a sample from the uniform (0, 1) distribution. Most random-number generators 
start with an initial value Xo, called the seed, and then recursively compute values by 
specifying positive integers a, c, and m, and then letting 


(1.1) 


Xn+1 = (AX, +c)modulom n=0 


where the foregoing means that aX,, + c is divided by m and the remainder is taken 
as the value of X,,,,. Thus, each X,, is either 0, 1,...,m— 1, and the quantity X,, /m is 
taken as an approximation to a uniform (0, 1) random variable. It can be shown that 
subject to suitable choices for a, c, and m, Equation (1.1) — gives rise toa 
sequence of numbers that look as if they were generated from independent uniform 
(0, 1) random variables. 


As our starting point in simulation, we shall suppose that we can simulate from the 
uniform (0, 1) distribution, and we shall use the term random numbers to mean 
independent random variables from this distribution. 


In the solitaire example, we would need to program a computer to play out the game 
starting with a given ordering of the cards. However, since the initial ordering is 
supposed to be equally likely to be any of the (52)! possible permutations, it is also 
necessary to be able to generate a random permutation. Using only random 
numbers, the following algorithm shows how this can be accomplished. The 
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algorithm begins by randomly choosing one of the elements and then putting it in 
position n; it then randomly chooses among the remaining elements and puts the 
choice in position n — 1, and so on. The algorithm efficiently makes a random choice 
among the remaining elements by keeping these elements in an ordered list and 
then randomly choosing a position on that list. 


Example 1a Generating a random permutation 


Suppose we are interested in generating a permutation of the integers 1, 2,...,n 
such that all n! possible orderings are equally likely. Then, starting with any initial 
permutation, we will accomplish this after n — 1 steps, where we interchange the 
positions of two of the numbers of the permutation at each step. Throughout, we 
will keep track of the permutation by letting X(i),i = 1,...,n denote the number 
currently in position i. The algorithm operates as follows: 


1. Consider any arbitrary permutation, and let X(i) denote the element in 
position i, i = 1...,n. [For instance, we could take X(i) = i,i = 1,...,n] 

2. Generate a random variable N,, that is equally likely to equal any of the 
values 1, 2,..., 7. 

3. Interchange the values of X(N,,) and X(n). The value of X(n) will now 
remain fixed. [For instance, suppose that n = 4 and initially 
X(i) = 1,i= 1, 2,3,4. lf N, = 3, then the new permutation is 
X(1) = 1,X(2) = 2,X(3) = 4,X(4) = 3, and element 3 will remain in 
position 4 throughout. ] 

4. Generate a random variable N,,_, that is equally likely to be either 1, 2,..., 
n—1. 

5. Interchange the values of X(N,,-,) and X(n — 1). [If Nz = 1, then the new 
permutation is X(1) = 4,X(2) = 2,X(3) = 1,X(4) = 3] 

6. Generate N,,_ 2, which is equally likely to be either 1, 2, ..., n — 2. 

7. Interchange the values of X(N,,_2) and X(n — 2). [If N, = 1, then the new 
permutation is X(1) = 2,X(2) = 4,X(3) = 1,X(4) = 3, and this is the final 
permutation. ] 

8. Generate N,,-3, and so on. The algorithm continues until Nz is generated, 
and after the next interchange the resulting permutation is the final one. 


To implement this algorithm, it is necessary to be able to generate a random 
variable that is equally likely to be any of the values 1, 2, ..., k. To accomplish 
this, let U denote a random number—that is, U is uniformly distributed on (0, 1)— 
and note that kU is uniform on (0, k). Hence, 


Pii-1<ku <i}== i=1,.., k 


so if we take N; = [kU] + 1, where [x] is the integer part of x (that is, the largest 
integer less than or equal to x), then N;,, will have the desired distribution. 


The algorithm can now be succinctly written as follows: 


Step 1. Let X(1), ..., X(n) be any permutation of 1, 2, ..., n. [For instance, we 
can set X(i) =i,i=1,.., n] 

Step 2. Let =n. 

Step 3. Generate a random number U and set N = [JU] + 1. 

Step 4. Interchange the values of X(N) and X(J). 

Step 5. Reduce the value of J by 1, and if J > 1, go to step 3. 

Step 6. X(1), .., X(n) is the desired random generated permutation. 


The foregoing algorithm for generating a random permutation is extremely useful. 
For instance, suppose that a statistician is developing an experiment to compare 
the effects of m different treatments on a set of n subjects. He decides to split the 
subjects into m different groups of respective sizes n,, nz, ..., Nm», where 

xj, n; =n, with the members of the ith group to receive treatment i. To 
eliminate any bias in the assignment of subjects to treatments (for instance, it 
would cloud the meaning of the experimental results if it turned out that all the 
“best” subjects had been put in the same group), it is imperative that the 
assignment of a subject to a given group be done “at random.” How is this to be 
accomplished? 


tAnother technique for randomly dividing the subjects when m = 2 
was presented in Example 2g of Chapter 6. The preceding 
procedure is faster, but requires more space than the one of 
Example 2g. 


A simple and efficient procedure is to arbitrarily number the subjects 1 through n 
and then generate a random permutation X(1), ..., X(n) of 1, 2, ..., n. Now assign 
subjects X(1), X(2), ..., X(n,) to be in group 1; X(n, + 1), .., X(n, +72) to be in 
group 2; and, in general, group j is to consist of subjects numbered 

X(ny +g ++ nj-1+ k), k = 1,..., nj. 


10.2 General Techniques for Simulating 
Continuous Random Variables 
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In this section, we present two general methods for using random numbers to 
simulate continuous random variables. 


10.2.1 The Inverse Transformation Method 


A general method for simulating a random variable having a continuous distribution 
—called the inverse transformation method—is based on the following proposition. 


Proposition 2.1 


Let U be a uniform (0, 1) random variable. For any continuous distribution 
function F, if we define the random variable Y by 


Y =F “(U) 


then the random variable Y has distribution function F. [F +(x) is defined to equal 
that value y for which F(y) = x.] 


Proof 
Fy(a) =P{Y <a} (2.1) 
= P{F-*(U) <a} 
Now, since F(x) is a monotone function, it follows that F *(U) < a if and only if 
U < F(a). Hence, from Equation (2.1) ,we have 
Fy(a) = P{U< F(a)} 


= F(a) 


It follows from Proposition 2.1 that we can simulate a random variable X having 
a continuous distribution function F by generating a random number U and then 
setting X = F *(U). 


Example 2a Simulating an exponential random variable 


If F(x) =1-—e*, then F *(u) is that value of x such that 


or 


x = —log(1 — wu) 


Hence, if U is a uniform (0, 1) variable, then 
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F *(U) = — log(1 - U) 


is exponentially distributed with mean 1. Since 1 — U is also uniformly distributed 
on (0, 1), it follows that —log U is exponential with mean 1. Since cX is 
exponential with mean c when X is exponential with mean 1, it follows that 
—clog U is exponential with mean c. 


The results of Example 2a __— can also be utilized to stimulate a gamma random 
variable. 


Example 2b Simulating a gamma (n, 2) random variable 


To simulate from a gamma distribution with parameters (n, A) when n is an 
integer, we use the fact that the sum of n independent exponential random 
variables, each having rate A, has this distribution. Hence, if U,, .., U, are 
independent uniform (0, 1) random variables, then 


has the desired distribution. 


10.2.2 The Rejection Method 


Suppose that we have a method for simulating a random variable having density 
function g(x). We can use this method as the basis for simulating from the 
continuous distribution having density f(x) by simulating Y from g and then 
accepting the simulated value with a probability proportional to f(Y)/g(Y). 


Specifically, let c be a constant such that 


fy) 


—<c forall 
gy) 4 


We then have the following technique for simulating a random variable having 
density f. 


Rejection Method 


Step 1Simulate Y having density g and simulate a random number U. 
Step 2)f U < f(Y)/cg(Y), set X = Y. Otherwise return to step 1. 
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The rejection method is expressed pictorially in Figure 10.1. .We now prove that it 
works. 


Figure 10.1 Rejection method for simulating a random variable X having 


density function f. 
Start 


Generate a 
random number 
U 


Generate 
Y~g 


Proposition 2.2 


The random variable X generated by the rejection method has density function f. 


Proof Let X be the value obtained and let N denote the number of necessary 
iterations. Then 


P{X<x} =P{Yy <x} 


cg(Y) 


f@) 
Pty 2% U6 aol 


- ply <xlU < a 


K 


where K = P{U < f(Y)/cg(Y) }. Now, by independence, the joint density function 
of Y and U is 


fm~w=g9y) 0<u<l 


so, using the foregoing, we have 


P(X<x} == I g(y)dudy (2.2) 
y Sx 


<u < f(y)/cao(y) 


0 
x rf(y)/cg(y) 
1 
-3/ | du g(y) dy 
—oo 0 
x 


= :{ f(y) dy 


Letting X approach oo and using the fact that f is a density gives 
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es ee 
= fo)dy = =| 


Hence, from Equation (2.2) __, we obtain 
x 
P{X <x} -| f(y)dy 


which completes the proof. 


Remarks 


a. Note that the way in which we “accept the value Y with probability f(Y)/cg(Y)” 
is by generating a random number U and then accepting Y if U < f(Y)/cg(Y). 

b. Since each iteration will independently result in an accepted value with 
probability P{U < f(Y)/cg(Y)} = K = 1/c, it follows that the number of 
iterations has a geometric distribution with mean c. 


Example 2c Simulating a normal random variable 


To simulate a unit normal random variable Z (that is, one with mean 0 and 
variance 1), note first that the absolute value of Z has probability density function 


(2.3) 


2 oa 
j= 0<x<o 
TT 


We will start by simulating from the preceding density function by using the 
rejection method, with g being the exponential density function with mean 1—that 
is, 


g(xy)=e * O0<x<00 


Now, note that 
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f(x) _ [2 —(x* — 2x) (2.4) 
gx) |x et 2 
_ [2 —(x*-2x+1) 1 
= |Fem{- | 
_ {[2e —(x- i 
ae eae a 
2e 
<= 
| 1 


Hence, we can take c = ,/2e/7; so, from Equation (2.4) _, 


f@) _ Joe b* 
cg(x) P 2 
Therefore, using the rejection method, we can simulate the absolute value of a 
unit normal random variable as follows: 


(a) | Generate independent random variables Y and U, Y being exponential with 
rate 1 and U being uniform on (0, 1). 
(b)  IfU <exp{—(Y—1)*/2}, set X = Y. Otherwise, return to (a). 


Once we have simulated a random variable X having Equation (2.3) as its 
density function, we can then generate a unit normal random variable Z by letting 
Z be equally likely to be either X or —X. 


In step (b), the value Y is accepted if U < exp{ — (Y — 1)*/2}, which is equivalent 
to —logU = (Y — 1)? /2. However, in Example 2a ___, it was shown that —log U is 
exponential with rate 1, so steps (a) and (b) are equivalent to 


(a') Generate independent exponentials Y, and Y,, each with rate 1. 
(b') If Y, > (Y; — 1)*/2, set X = Y,. Otherwise, return to (a’). 


Suppose now that the foregoing results in Y, being accepted—so we know that 
Y, is larger than (Y, — 1)?/2. By how much does the one exceed the other? To 
answer this question, recall that Y, is exponential with rate 1; hence, given that it 
exceeds some value, the amount by which Y, exceeds (Y, — 1)?/2 [that is, its 
“additional life” beyond the time (Y, — 1)?/2] is (by the memoryless property) 
also exponentially distributed with rate 1. That is, when we accept step (b'), not 
only do we obtain X (the absolute value of a unit normal), but, by computing 
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Y, —(Y, — 1)?/2, we also can generate an exponential random variable (that is 
independent of X) having rate 1. 


Summing up, then, we have the following algorithm that generates an 
exponential with rate 1 and an independent unit normal random variable: 


Step 1.Generate Y,, an exponential random variable with rate 1. 

Step 2.Generate Y,, an exponential random variable with rate 1. 

Step 3If Y, — (Y, —1)*/2 > 0, set Y = Y, — (Y; —1)*/2 and go to step 4. 
Otherwise, go to step 1. 

Step 4.Generate a random number U, and set 


The random variables Z and Y generated by the foregoing algorithm are 
independent, with Z being normal with mean 0 and variance 1 and Y being 
exponential with rate 1. (If we want the normal random variable to have mean yu 
and variance co”, we just take u + oZ.) 


Remarks 


a. Since c = J2e/n = 1.32, the algorithm requires a geometrically distributed 
number of iterations of step 2 with mean 1.32. 

b. If we want to generate a sequence of unit normal random variables, then 
we can use the exponential random variable Y obtained in step 3 as the 
initial exponential needed in step 1 for the next normal to be generated. 
Hence, on the average, we can simulate a unit normal by generating 
1.64( = 2 x 1.32 — 1) exponentials and computing 1.32 squares. 


Example 2d Simulating normal random variables: the polar method 


It was shown in Example 7b of Chapter 6 __ that if X and Y are independent unit 
normal random variables, then their polar coordinates 

R=/xX*+yY", @ =tan ‘(Y/X) are independent, with R? being exponentially 
distributed with mean 2 and 0 being uniformly distributed on (0,277). Hence, if U, 
and U, are random numbers, then, using the result of Example 2a_—_, we can 
set 


R=(-2logU,)1/” 
© =2nU, 
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from which it follows that 


X =Rcos® =(—2logU,)*/*cos(2nU2) (2.5) 


Y=Rsin® =(—2logU,)*/*sin(2nU>2) 


are independent unit normals. 


The preceding approach to generating unit normal random variables is called the 
Box—Muller approach. \ts efficiency suffers somewhat from its need to compute the 
sine and cosine values. There is, however, a way to get around this potentially time- 
consuming difficulty. To begin, note that if U is uniform on (0, 1), then 2U is uniform 
on (0, 2), so 2U — 1 is uniform on ( — 1,1). Thus, if we generate random numbers U, 
and U, and set 


V, =2U,-1 
V,=2U,-1 


then (V,, Vz) is uniformly distributed in the square of area 4 centered at (0, 0). (See 
Figure 10.2) 


Figure 10.2 
(-1, 1) 


(=—],—1) 


e = (0,0) 
x = (V;, V>) 


Suppose now that we continually generate such pairs (V,, V2) until we obtain one 
that is contained in the disk of radius 1 centered at (0, 0)—that is, until V? + V3 <1. It 
then follows that such a pair (V,, V2) is uniformly distributed in the disk. Now, let R,O 
denote the polar coordinates of this pair. Then it is easy to verify that R and © are 
independent, with R° being uniformly distributed on (0, 1) and © being uniformly 


distributed on (0,277). (See Problem 10.13 _ .) 


Since 
= V V 
sin® = — = ——— 
R fy?+v32 
_ V V 
cos6 =—= z 
R 


lv? +v2 


it follows from Equation (2.5) — that we can generate independent unit normals X 
and Y by generating another random number U and setting 


X =(—2logu)'/"V,/R 
Y =(—2logU)*/"V,/R 


In fact, because (conditional on V? + V3 < 1) R’ is uniform on (0, 1) and is 
independent of 6, we can use it instead of generating a new random number U, thus 


showing that 
_ =2,1/2V, | [=2log s 
X =(-2logR ) z 5 Vi 
= =2,1/2V2 _ [—2logs 
Y=(-2logR ) z 5 V> 


are independent unit normals, where 


S=R =v?4+v3 


Summing up, we have the following approach to generating a pair of independent 
unit normals: 


Step 1Generate random numbers U, and U,. 

Step 2Set V, = 2U, —1,V, = 2U, -1,5 =V?4+ V2. 
Step 31f S > 1, return to step 1. 

Step 4Return the independent unit normals 


[=2logs [=2logs 


The preceding algorithm is called the polar method. Since the probability that a 
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random point in the square will fall within the circle is equal to 2/4 (the area of the 
circle divided by the area of the square), it follows that, on average, the polar method 
will require 4/7 ~ 1.273 iterations of step 1. Hence, it will, on average, require 2.546 
random numbers, 1 logarithm, 1 square root, 1 division, and 4.546 multiplications to 
generate 2 independent unit normals. 


Example 2e Simulating a chi-squared random variable 
The chi-squared distribution with n degrees of freedom is the distribution of 
x2 = Zi +--++Z;, where Z;, i = 1,..., n are independent unit normals. Now, it 


was shown in Section 6.3. ofChapter6 that Z? + Z3 has an exponential 
distribution with rate : Hence, when n is even (say, n = 2k), 72, has a gamma 


distribution with parameters (k, ; ). Thus, —2log(II_, U;) has a chi-squared 


distribution with 2k degrees of freedom. Accordingly, we can simulate a chi- 
squared random variable with 2k + 1 degrees of freedom by first simulating a unit 
normal random variable Z and then adding Z? to the foregoing. That is, 


k 
Meo = 2’ — Zilog | | U; 


i=1 


where Z, U;, ..., U, are independent, Z is a unit normal, and Uj, ..., U, are 
uniform (0, 1) random variables. 


10.3 Simulating from Discrete Distributions 


All of the general methods for simulating random variables from continuous 
distributions have analogs in the discrete case. For instance, if we want to simulate a 
random variable Z having probability mass function 


PXeap=Py_ f=O0Ajs, >i =1 
j 


we can use the following discrete time analog of the inverse transform technique: 


To simulate X for which P{X = x;} = P,, let U be uniformly distributed over (0, 1) and 
set 
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x, ifU<P, 
X2 ifP,; <U<P,+P, 


X= j= 


J 
1 i 


Since 


it follows that X has the desired distribution. 


Example 3a The geometric distribution 


Suppose that independent trials, each of which results in a “success” with 
probability p,0 < p < 1, are continually performed until a success occurs. Letting 
X denote the necessary number of trials; then 


P{xX =i}=(1-p)' 'p i=1 


which is seen by noting that X = i if the first i — 1 trials are all failures and the ith 
trial is a Success. The random variable X is said to be a geometric random 
variable with parameter p. Since 


j-1 
». P{x=i} =1-P{x>j-1} 


= 1 — Pf{first j — 1 are all failures} 


=1-(1-p)! j21 


we can simulate such a random variable by generating a random number U and 
then setting X equal to that value j for which 


1-(1-p)? +<U<1-(1-p) 


or, equivalently, for which 


(1—p)i<s1-U<(1-p)’* 
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Since 1 — U has the same distribution as U, we can define X by 
X =min{j:(1-p) <U} 


= min {j: jlog(1 — p) < log U} 
nf .. _logU 
= min, j: j > —-—~ 
log(1 — p) 


where the inequality has changed sign because log(1 — p) is negative [Since 
log(1 — p) > log1 = 0]. Using the notation [x] for the integer part of x (that is, [x] 
is the largest integer less than or equal to x), we can write 


reise 
log(1 — p) 


As in the continuous case, special simulating techniques have been developed for 
the more common discrete distributions. We now present two of these. 


Example 3b Simulating a Binomial Random Variable 


A binomial (n, p) random variable can easily be simulated by recalling that it can 
be expressed as the sum of n independent Bernoulli random variables. That is, if 
U,, ..., U, are independent uniform (0, 1) variables, then letting 


‘(0 othewise 


n 
it follows that X = >. X; is a binomial random variable with parameters n and p. 


t=1 
Example 3c Simulating a Poisson Random Variable 


To simulate a Poisson random variable with mean A, generate independent 
uniform (0, 1) random variables U,, U2, ... stopping at 


n 
N=minjyn: | | vices 
i=1 


The random variable X = N — 1 has the desired distribution. That is, if we 
continue generating random numbers until their product falls below e~ 4, then the 
number required, minus 1, is Poisson with mean A. 


That X = N — 1 is indeed a Poisson random variable having mean A can perhaps 
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be most easily seen by noting that 
n 

X+1=min({n: | | U;<e4 

— 


1 


is equivalent to 


n 0 
X = max 4n: | | U;>e +> where | | 0: = 1 
i=1 


i=1 


or, taking logarithms, to 


n 
X = max4n: > logU; = —A 
i=1 


or 


n 
xX =max{n: ». —logU; <A 


i=1 


However, —log U; is exponential with rate 1, so X can be thought of as being the 
maximum number of exponentials having rate 1 that can be summed and still be 
less than A. But by recalling that the times between successive events of a 
Poisson process having rate 1 are independent exponentials with rate 1, it 
follows that X is equal to the number of events by time A of a Poisson process 
having rate 1; thus, X has a Poisson distribution with mean A. 


10.4 Variance Reduction Techniques 


Let X,, ..., X, have a given joint distribution, and suppose that we are interested in 
computing 


= El gi, xy AQ) 


where g is some specified function. It sometimes turns out that it is extremely difficult 
to analytically compute @, and when such is the case, we can attempt to use 
simulation to estimate 6. This is done as follows: Generate x, aay x) having the 
same joint distribution as X,, ..., X, and set 
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1 1 
Vea GCunenw) 


Now let x, any x simulate a second set of random variables (independent of the 
first set) having the distribution of X,, ..., X,, and set 


2 2 
Y> = 9(X®, ..., X) 


Continue this until you have generated k (some predetermined number) sets and so 
have also computed Y,, Y2, ..., Yx. Now, Y,, .., Y, are independent and identically 
distributed random variables, each having the same distribution as g(Xj, ..., Xn). 
Thus, if we let Y denote the average of these k random variables—that is, if 


ay 
y=) 3 
i=1 


then 


E[Y] =@ 
E[(Y —6)*] =var(Y) 


Hence, we can use Y as an estimate of 0. Since the expected square of the 
difference between Y and @ is equal to the variance of Y, we would like this quantity 
to be as small as possible. [In the preceding situation, var(Y) = var(Y;)/k, which is 
usually not known in advance, but must be estimated from the generated values 
Y,, .., Yy-] We now present three general techniques for reducing the variance of 
our estimator. 


10.4.1 Use of Antithetic Variables 


In the foregoing situation, suppose that we have generated Y, and Y>3, which are 
identically distributed random variables having mean @. Now, 


Y,+Y, 1 
var| —>—} = glvar(V71) + var(¥2) + 2Cov(¥, ¥2)] 


var(Y,)  cov(Y,, Y2) 
ge pee 


Hence, it would be advantageous (in the sense that the variance would be reduced) 
if Y; and Y, were negatively correlated rather than being independent. To see how 
we could arrange this, let us suppose that the random variables X,, ..., X,, are 
independent and, in addition, that each is simulated via the inverse transform 
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technique. That is, X; is simulated from F; *(U;), where U; is a random number and 
F; is the distribution of X;. Thus, Y,; can be expressed as 


¥1 = 9(Fi (U1), ) Fn (Un) 


Now, since 1 — U is also uniform over (0, 1) whenever U is a random number (and is 
negatively correlated with U), it follows that Y, defined by 


Y. = g(Fi (1-4), », Fa — Un) 


will have the same distribution as Y,. Hence, if Y; and Y, were negatively correlated, 
then generating Y, by this means would lead to a smaller variance than if it were 
generated by a new set of random numbers. (In addition, there is a computational 
savings because, rather than having to generate n additional random numbers, we 
need only subtract each of the previous n numbers from 1.) Although we cannot, in 
general, be certain that Y, and Y, will be negatively correlated, this often turns out to 
be the case, and indeed it can be proven that it will be so whenever g is a monotonic 
function. 


10.4.2 Variance Reduction by Conditioning 


Let us start by recalling the conditional variance formula (see Section 7.5.4) 


Var(Y) = E[Var(Y |Z)] + Var(E[Y |Z]) 


Now, suppose that we are interested in estimating E[g(X,, .., X,)] by simulating 
X = (X,, .., X,) and then computing Y = g(X). If, for some random variable Z we 
can compute E[Y |Z], then, since var(Y |Z) = 0, it follows from the preceding 
conditional variance formula that 


Var(E[Y|Z]) < Var(Y) 


Thus, since E[E[Y |Z]] = E[Y], it follows that E[Y |Z] is a better estimator of E[Y] than 
is Y. 


Example 4a Estimation of z 


Let U, and U, be random numbers and set V; = 2U; — 1,i = 1,2. As noted in 
Example 2d _ , (V;, Vz) will be uniformly distributed in the square of area 4 
centered at (0, 0). The probability that this point will fall within the inscribed circle 
of radius 1 centered at (0, 0) (see Figure 10.2 _) is equal to 7/4 (the ratio of the 
area of the circle to that of the square). Hence, upon simulating a large number n 
of such pairs and setting 
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pe 1. ifthe jth pair falls within the circle 
! ~~ (0 otherwise 

it follows that /;, j = 1, .., n, will be independent and identically distributed 

random variables having E|/;] = 2/4. Thus, by the strong law of large numbers, 


I, + eed + Ly TU 
——— > - asn->o 
n 4 
Therefore, by simulating a large number of pairs (V,, V2) and multiplying the 


proportion of them that fall within the circle by 4, we can accurately approximate 
Tl. 


The preceding estimator can, however, be improved upon by using conditional 

expectation. If we let J be the indicator variable for the pair (V;, V2), then, rather 

than using the observed value of J, it is better to condition on V, and so utilize 
EUN\Vy] =P{v2+vZ<1|V,} 


= P{V3<1-Vi|V3} 


Now, 
P{v2<1-Vi|V,=v} =pP{vi<1-v?} 
= P{-V1—v? <V, <v1-v?} 
=vi-v? 
so 


E\I|Vi]= [1-v? 


Thus, an improvement on using the average value of J to estimate 7/4 is to use 
the average value of /1— V7. Indeed, since 


[1 —v? 


where U is uniform over (0, 1), we can generate n random numbers U and use 
the average value of ./1 — U” as our estimate of 7/4. (Problem 10.14 shows 
that this estimator has the same variance as the average of the n values, 


y1—-—V2) 


eg 


1 1 
=| bit=vav= | =e = eT 
-1 0 
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The preceding estimator of z can be improved even further by noting that the 
function g(u) = V1 — u2,0 < u < 1, is a monotonically decreasing function of u, 
and so the method of antithetic variables will reduce the variance of the estimator 
of E[./1 — U?]. That is, rather than generating n random numbers and using the 
average value of \/1 — U* as an estimator of 2/4, we would obtain an improved 
estimator by generating only n/2 random numbers U and then using one-half the 


average of 1—U?+ /1—(1—U)*as the estimator of 1/4. 


The following table gives the estimates of z resulting from simulations, using 
n = 10,000, based on the three estimators. 


Method Estimate of 7 
Proportion of the random points that fall in the circle 3.1612 
Average value of 1 — U* 3.128448 


Average value of 1 — U* + [a —(1-—U)’ 3.139578 


A further simulation using the final approach and n = 64,000 yielded the estimate 
3.143288. 


10.4.3 Control Variates 


Again, suppose that we want to use simulation to estimate E[g(X)], where X= 
(X,, .., Xp). But suppose now that for some function f, the expected value of f(X) is 
known—say, it is E|f(X)] = uw. Then, for any constant a, we can also use 


W = g(X) + alf(X) - 4] 
as an estimator of E[g(X)]. Now, 
(4.1) 
var(W) = var[g(X)] + a? var[f(X)] + 2a cov[g(X), f(X)] 
Simple calculus shows that the foregoing is minimized when 


(4.2) 


~cov[f(X), 9(X)] 
var[f(X)] 


and for this value of a, 


(4.3) 


cov[f(X), g(X)]° 
var(W) = var[g(X) | — ———-.——_- 
(W) = varl9@)] - rea) 
Unfortunately, neither Var[f(X)] nor Cov[f(X)], g(X)] is usually known, so we cannot 
in general obtain the foregoing reduction in variance. One approach in practice is to 
use the simulated data to estimate these quantities. This approach usually yields 
almost all of the theoretically possible reduction in variance. 


Summary 


Let F be a continuous distribution function and U a uniform (0, 1) random variable. 
Then the random variable F-*(U) has distribution function F, where F~ ‘(u) is that 
value x such that F(x) = u. Applying this result, we can use the values of uniform (0, 
1) random variables, called random numbers, to generate the values of other random 
variables. This technique is called the inverse transform method. 


Another technique for generating random variables is based on the rejection method. 
Suppose that we have an efficient procedure for generating a random variable from 
the density function g and that we desire to generate a random variable having 
density function f. The rejection method for accomplishing this starts by determining 
a constant c such that 


f(x) 
g(x) 


max <C 


It then proceeds as follows: 


1. Generate Y having density g. 

2. Generate a random number U. 

3. If U < f(Y)/cg(Y), set X = Y and stop. 
4. Return to step 1. 


The number of passes through step 1 is a geometric random variable with mean c. 


Standard normal random variables can be efficiently simulated by the rejection 
method (with g being exponential with mean 1) or by the technique known as the 
polar algorithm. 


To estimate a quantity 9, one often generates the values of a partial sequence of 
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random variables whose expected value is 9. The efficiency of this approach is 
increased when these random variables have a small variance. Three techniques 
that can often be used to specify random variables with mean @ and relatively small 
variances are 


1. the use of antithetic variables, 
2. the use of conditional expectations, and 
3. the use of control variates. 


Problems 


10.1. The following algorithm will generate a random permutation of the 
elements 1, 2, ..., n. It is somewhat faster than the one presented in 
Example 1a___but is such that no position is fixed until the algorithm 
ends. In this algorithm, P(i) can be interpreted as the element in 


position i. 
Step 1. Set k = 1. 
Step 2. Set P(1) = 1. 
Step 3. If k = n, stop. Otherwise, let k = k + 1. 
Step 4. Generate a random number U and let 

P(k) = P({kU] +1) 
P({kU]+1) =k 
Go to step 3. 


a. Explain in words what the algorithm is doing. 

b. Show that at iteration k—that is, when the value of P(k) is 
initially set-—P(1), P(2), ..., P(k) is a random permutation of 
LBs any Ke 


Hint: Use induction and argue that 


Pl big ties ty 1 Gp oe Bead) 


= Pyeng (lip tay iy age & Eps tp} 


od 


1 
= by the induction hypothesis 


10.2. Develop a technique for simulating a random variable having 
density function 
eX -w<x<0 


ff) = 


e* 0<x<0 


10.3. Give a technique for simulating a random variable having the 
probability density function 


1 

3% — 2) 2<x<3 
f@)= 41 x 

=P < 

5\ 3) See 

0 otherwise 


10.4. Present a method for simulating a random variable having 
distribution function 


0 x< -3 

1 x 

3°*6 —-3<x<0 
F(x) = a 

3+ 30 0<x<4 

1 x>A4 


10.5. Use the inverse transformation method to present an approach 
for generating a random variable from the Weibull distribution 
F(t) =1-e-#" t>0 


10.6. Give a method for simulating a random variable having failure 
rate function 


a. A(t) = ¢; 
b. A(t) = ct; 
COA) S00": 
dA) = ct. 


10.7. Let F be the distribution function 
F(x) =x" O<x<1 


a. Give a method for simulating a random variable having 
distribution F that uses only a single random number. 
b. Let U,, .., U, be independent random numbers. Show that 
P{max(U,, a5 -U,)S.} 3" 


c. Use part (b) to give a second method of simulating a random 
variable having distribution F. 


10.8. Suppose it is relatively easy to simulate from F'; for each 
i=1,.., n. Howcan we simulate from 
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a. F(x) = | F,(x)? 
i=1 


b. F(x) =1- | | [1 — F,(x)]? 


i=1 


10.9. Suppose we have a method for simulating random variables from 
the distributions F, and F,. Explain how to simulate from the 
distribution 

F(x) = pFi(x) + (1—p)F2@®) O<p<1 


Give a method for simulating from 


1 7 2 
gil-e ete 0<x<il 
F(x) = 
Pear -3xy 42 >1 
3 er) 3% 


10.10.In Example 2c — we simulated the absolute value of a unit 
normal by using the rejection procedure on exponential random 
variables with rate 1. This raises the question of whether we could 
obtain a more efficient algorithm by using a different exponential 
density—that is, we could use the density g(x) = Ae~’*. Show that the 
mean number of iterations needed in the rejection scheme is minimized 
when A = 1. 

10.11. Use the rejection method with g(x) = 1,0 < x < 1, to determine 
an algorithm for simulating a random variable having density function 

_ (60x3(1—x)? O<x<1 


otherwise 


f(x) 


10.12. Explain how you could use random numbers to approximate 
fok(xdx, where k(x) is an arbitrary function. 

Hint: If U is uniform on (0, 1), what is E[k(U)|? 

10.13. Let (X, Y) be uniformly distributed in the circle of radius 1 
centered at the origin. Its joint density is thus 


1 
SO) O<x?+y?2<1 


Let R = (X? + Y*)'/? and @ = tan ‘(Y/X) denote the polar 
coordinates of (X, Y). Show that R and @ are independent, with R? being 
uniform on (0, 1) and 6 being uniform on (0,277). 

10.14. In Example 4a, we showed that 


E[( m Vy] = E[(1 = ey _ me 


when V is uniform ( — 1,1) and U is uniform (0, 1). Now show that 
var[(1 — V2)*/7] = var[(1 — U) 1/7] 


and find their common value. 
10.15. 
a. Verify that the minimum of (4.1) occurs when a is as given by 
(4.2 ). 
b. Verify that the minimum of (4.1 _) is given by (4.3 _ ). 


10.16. Let X be a random variable on (0, 1) whose density is f(x). 
Show that we can estimate Sogooax by simulating X and then taking 
9(X)/f (X) as our estimate. This method, called importance sampling, 
tries to choose f similar in shape to g, so that g(X)/f(X) has a small 
variance. 


Self-Test Problems and Exercises 


10.1. The random variable X has probability density function 
f(x) =Ce* O<x<1 


a. Find the value of the constant C. 
b. Give a method for simulating such a random variable. 


10.2. Give an approach for simulating a random variable having probability 


density function 
f(x) = 30(x? — 2x3 +x%) O<x<1 


10.3. Give an efficient algorithm to simulate the value of a random variable 


with probability mass function 
P,=15 p,=2 pz=-35 p,=.30 


10.4. If X is a normal random variable with mean yu and variance o?, define a 
random variable Y that has the same distribution as X and is negatively 
correlated with it. 
10.5. Let X and Y be independent exponential random variables with mean 1. 
a. Explain how we could use simulation to estimate E[e**]. 
b. Show how to improve the estimation approach in part (a) by using a 
control variate. 
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Answers to Selected Problems 


Chapter 1 


. 67,600,000; 19,656,000 
. 1296 

24; 4 

. 144; 18 

2401 

. 720; 72; 144; 72 

. 120; 1260; 34,650 

. 27,720 

. 40,320; 10,080; 1152; 2880; 384 
. 720; 72; 144 

. 280, 270 

. 89 

. 24,300,000; 17,100,720 
. 190 

. 2,598,960 

. 42; 94 

. 604,800 

. 600 

. 896; 1000; 910 

. 36; 26 

35 

18 

. 48 

.521/(131)* 

. 27,720 

. 65,536; 2520 

. 12,600; 945 

. 564,480 

. 165; 35 

. 1287; 14,112 
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36. 220; 572 


Chapter 2 


9.74 

10. .4; .1 

11. 70; 2 

12. .5; .32; 149/198 
13. 20,000; 12,000; 11,000; 68,000; 10,000 
14. 1.057 

15. .0020; .4226; .0475; .0211; .00024 
17. .1102 

18. .048 

19. 5/18 

20. .9052 

22. (n+1)/2" 

23. 5/12 

25. .4 

26. .492929 

28. .0888; .2477; .1244; .2099 
30. 1/18; 1/6; 1/2 
31. 2/9; 1/9 

33. 70/323 

34. .000547 

36. .0045; .0588 
37. .0833; .5 

38. 4 

39. .48 

40. .8134; .1148 
41. .5177 

44. .3; .2; .1 

46.5 

47. .1399 

48. .00106 

49. .4329 

50. 2.6084 x 107° 
52. .2133841 

53. 12/35 

54. .0511 

55. .2198; .0342 
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Chapter 3 


. 1/3 

. 1/6; 1/5; 1/4; 1/3; 1/2; 1 
.339 

. 6/91 

1/2 

2/3 

. 1/2 

Tit 

10. .22 

- 1/17; 1/33 

. 2/3 

. 504; .3629 

. 35/768; 35/128 

. 4848 

. 9835 

. 0792; .264 

. .331; .383; .286; .4862 
. 44.29; 41.18 

. 4; 1/26 

. 496; 3/14; 9/62 

. 5/9; 1/6; 5/54 

. 4/9; 1/2 

. 1/3; 1/2 

. 20/21; 40/41 

. 3/128; 29/1536 

. 0893 

. 7/12; 3/5 

. 16, 49/76 

. 27/31 

. 62, 10/19 

. 1/2 

bs WS; 1 

. 12/37 

. 46/185 

. 3/13; 5/13; 5/52; 15/52 
. 43/459 

. 1.03 percent; .3046 
. 419 

. .58; 28/58 
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50. 2/3 
52. .175; 38/165; 17/33 
53. .65; 56/65; 8/65; 1/65; 14/35; 12/35; 9/35 


1 


55. 3/20; 17/27 

56. .40; 17/40; 3/8; 0.08825 

57. p?/[p3 + (1—p)*1;[p3(1- (1—-p)*) + A -p)? 
(1—p*)]/[p? + (1—-p)*] 

58. (1/2)/(1 — (1/2)" *) 

60.9 

62. (c) 2/3 

65. 2/3; 1/3; 3/4 

66. 1/6; 3/20 

69. .4375 

73. (i) 9/128, 9/128, 18/128, 110/128 (ii) 1/32, 1/32, 1/16, 15/16 
74. 1/9; 1/18 

76. 1/16; 1/32; 5/16; 1/4; 31/32 

77. 9/19 

78. 3/4, 7/12 

81. 2p3(1 — p) + 2p(1 — p)*; p?/(1 — 2p + 2p) 
82. .5550 

86. .5; .6; .8 

87. 9/19; 6/19; 4/19; 7/15; 53/165; 7/33 

91. 9/16 

94. 97/142; 15/26; 33/102 


1 : 
95. —(1- (1-p)") 


96. p1(1 — p2) —p,p,/2; p,/(2 - P2) 


Chapter 4 


1. p(4) = 6/91; p(2) = 8/91; p(1) = 32/91; p(0) = 1/91; p( — 1) = 16/91; 
p( — 2) = 28/91 

4. (a) 1/2; 5/18; 5/36; 5/84; 5/252; 1/252; 0; 0; 0; 0 

5.n—2i;i=0,..,n 

6. p(3) = p(— 3) = 1/8; p(1) = p(- 1) = 3/8 

11b. log, ,U +1) 

12. p(4) = 1/16; p(3) = 1/8; p(2) = 1/16; p(0) = 1/2; p(— i) = p@; 

p(0) =1 

13. p(0) = .28; p(500) = .27, p(1000) = .315; p(1500) = .09; 
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p(2000) = .045 


14. 
16. 
Ads 
19. 
20. 
21. 
24. 
25. 
Zi. 
28. 
31. 
32. 
33. 
35. 
38. 
40. 
41. 
43. 
46. 
52. 
53. 
54. 
56. 
57. 
58. 
59. 
63. 
64. 
66. 
68. 
69. 
70. 
rau 
73. 
74. 
76. 
ge 
84. 


85 
86 


p(0) = 1/2; p(1) = 1/6; p(2) = 1/12; p(3) = 1/20; p(4) = 1/5 
kK/(kK+1)1<sk<n1/nik=n 

1/4; 1/6; 1/12; 1/2 

1/2; 1/10; 1/5; 1/10; 1/10 

-5918; no; —.108 

39.28; 37 

p = 11/18; maximum = 23/72 

.46, 1.3 

A(p + 1/10) 

3/5 


* 


p 
11 — 10(.9)*° 

3 

—.067; 1.089 

82.2; 84.5 

3/8 

11/243 

2.8; 1.476 

3 

17/12; 99/60 

1/10; 1/10 

e~?2:1—1.2e—? 

1 —e7 8; 1 — 7 219-18 

e~22; 1 — 3,2¢e722 

.03239 

110; 

.8886 

.4082 

.0821; .2424 

.3935; .2293; .3935 

2/(2n —1); 2/(2n— 2); e71 
2/n; (2n—3)/(n —1)*; e72 


= -5 
e 10e 


prU=pe" 

1500; .1012 

5.8125 

32/243; 4864/6561; 160/729; 160/729 
3/10; 5/6; 75/138 

. 3439 

1.5 


89. .1793; 1/3; 4/3 


Chapter 5 


soe 7) 

.no; no 

. 1/2, .8999 

4=(01y* 

. 4,0, 00 

. 3/5; 6/5 

2 

10. 2/3; 2/3 

11. 2/5 

13. 2/3; 1/3 

15. .7977; .6827; .3695; .9522; .1587 
16. (.9938)*° 

17...315; .136 

18. 22.66 

19. 14.56 

20. .9994; .75; .977 

22. .974 

23. .9253; .1762 

26. .0606; .0525 

28. .8363 

29. .9993 

32. e—1; e~ 1/2 

33. exponential with parameter 1 
34,¢°° 1/3 

35. exponential with parameter 1/c 
39. 3/5 

41.a= —2,b=22 

42. 1ly 


ONOAA AR WD 


Chapter 6 


2. (a) 14/39; 10/39; 10/39; 5/39 (b) 84; 70; 70; 70; 40; 40; 40; 15 all divided 
by 429 

3. 15/26; 5/26; 5/26; 1/26 

4. (a) 64/169; 40/169; 40/169; 25/169 

6. .20, .30, .30, .20; .18, .30, .31, .21; 2.5; 2.55; 1.05; 1.0275 


7. p(i,j) = p21 — p)** 
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8.c = 1/8; E[X] =0 
9. (12x? + 6x) /7; 15/56; .8625;5/7;8/7 


a0. 
12. 
13. 
15. 
16. 
re 
19. 
21. 
22. 
23. 
25. 


28. 


29. 
30. 
31. 
32. 
33. 
34. 
35. 
36. 
37. 
39. 
40. 
45. 
46. 
50. 
51. 
52. 
56. 
57. 


60 


i27i=e 

39.3e75 

1/6; 1/2 

1/4 

n(1/2)"~* 

1/3 
-In(y),0<y<1;1,0<x<1;1/2;1/4 
2/5; 2/5 

no; 1/3 

1/2; 2/3; 1/20; 1/18 

efi) 

se 1—3e7? 

.0326 

.3772; .2061 

.0829; .3766 

5/16; .0228 

P(X, + Xz > 25); P(X, > 15) 
20+ 5v2 

(a) .6572; (b) yes; (d) .2402 
.9346 

e~2:1—3e-2 

5/13; 8/13 

1/6; 5/6; 1/4; 3/4 

(y +.1)?xe-*O'+D; xe-*Y; e~4 
1/2 + 3y/(4x) — y?/(4x%) 
((L — 2d)/L)* 

.19297 

1 — e~54a, (ie en 4a? 

r/ 

' fa 

(a) u/(v + 1)” 


Chapter 7 


1. 52.5/12 

2. 324; 198.8 
3. 1/2; 1/4; 0 
4. 1/6; 1/4; 1/2 


5. 3/2 


6. 35 

7. .9; 4.9; 4.2 
8.(1-(1-p)")/p 
10. .6; 0 


11. 2(n — 1)p(1 — p) 

12. (3n? — n)/(4n — 2), 3n?/(4n — 2) 
14. m/(1—p) 

15. 1/2 

18.4 

21. .9301; 87.5755 

22. 14.7 

23. 147/110 

26. n/(n+1);1/(n+1) 


29. —— 


31. 175/6 

33. 14, 45 

34. 20/19; 360/361 

35. 21.2; 18.929; 49.214 
36. —n/36 

37.0 

38. 1.94, 2.22; .6964, .6516; .0932; .1384 
40. 1/8 

43. 6; 112/33 

44. 100/19; 16,200/6137; 10/19; 3240/6137 
47. 1/2; 0 

49.1/(n-1) 

50. 6; 7; 5.8192 

51. 6.07 

52. 2y* 

53. y3/4 

55. 12 

56. 8 

58. N(1 — e 19/%) 

59. 12.5 

64 


4 4\ = 
ptt, >. (joa — py*‘e“@*9(4 +. 1)°/64 (1 + p)/(1 —p)s (A — P)e)/(e? —P) 


65. 1/2; 1/3; 1/(n(n + 1)) 
68. —96/145 
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70. 4.2; 5.16 

71. 218 

72:x(1+ Qp—1)7]° 
74. 1/2; 1/16; 2/81 

75. 1/2, 1/3 

77. 1/i; [iG + 1)]7 7; 
78. u;1+ 07; yes; co? 
84. .151; .141 


Chapter 8 


1. > 19/20 

2.15/17; =>3/4; =10 

3. 23 

4. < 4/3; 8428 

5. .1416 

6. .9431 

7. .3085 

8. .6932 

9. (258)7 

10. 117 

11. 2 O57 

13. .0162; .0003; .2514; .2514 
14. n> 23 

16. .013; .018; .691 

18. <.2 

23. .769; .357; .4267; .1093; .112184 
24. answer is (a) 


Chapter 9 


1. 1/9; 5/9 

3. .9953; .9735; .9098; .7358 
10. (b)1/6 

14. 2.585; .5417; 3.1267 

15. 5.5098 


Solutions to Self-Test Problems and 
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Exercises 


Chapter 1 


TAs 
a. There are 4! different orderings of the letters C, D, E, F. For each of 


1.2. 


these orderings, we can obtain an ordering with A and B next to each 
other by inserting A and B, either in the order A, B or in the order B, A, 
in any of 5 places, namely, either before the first letter of the 
permutation of C, D, E, F, or between the first and second, and so on. 
Hence, there are 2-5-4! = 240 arrangements. Another way of solving 
this problem is to imagine that B is glued to the back of A. Then there 
are 5! orderings in which A is immediately before B. Since there are 
also 5! orderings in which B is immediately before A, we again obtain a 
total of 2-5! = 240 different arrangements. 


. There are 6! = 720 possible arrangements, and since there are as 


many with A before B as with B before A, there are 360 arrangements. 


. Of the 720 possible arrangements, there are as many that have A 


before B before C as have any of the 3! possible orderings of A, B, and 
C. Hence, there are 720/6 = 120 possible orderings. 


. Of the 360 arrangements that have A before B, half will have C before 


D and half D before C. Hence, there are 180 arrangements having A 
before B and C before D. 


. Gluing B to the back of A and D to the back of C yields 4! = 24 different 


orderings in which B immediately follows A and D immediately follows 
C. Since the order of A and B and of C and D can be reversed, there 
are 4-24 = 96 different arrangements. 


f. There are 5! orderings in which E is last. Hence, there are 


6! — 5! = 600 orderings in which E is not last. 


3! 4! 3! 3!, since there are 3! possible orderings of countries and then 


the countrymen must be ordered. 


1.3. 


a.10-9-8=720 
b.8-7-6+2-3-8-7 = 672. The result of part (b) follows because there 


are 8-7-6 choices not including A or B and there are 3 - 8 - 7 choices 
in which a specified one of A and B, but not the other, serves. The latter 
follows because the serving member of the pair can be assigned to any 
of the 3 offices, the next position can then be filled by any of the other 8 
people, and the final position by any of the remaining 7. 


c.8-7-6+3:-2:-8 = 384. 


7 
1.5. = 210 
3,2,2 


7 
1.6. There are (7) = 35 choices of the three places for the letters. For 


each choice, there are (26) 3(10)* different license plates. Hence, altogether 
there are 35 - (26)*- (10)* different plates. 
1.7. Any choice of r of the n items is equivalent to a choice of n — r, 
namely, those items not selected. 
1.8. 

a.10-9-9--9=10-9" * 


ion 


n . n 
( ‘jo since there are ( . choices of the i places to put the zeroes 
U U 


and then each of the other n — i positions can be any of the digits 
Dy cee 


1.9. 


EQ) =m 


3) = (5) + 3n*(n-1) +n? 


@ 


2. 
oN SON 


1.10. There are 9-8-7-6-5 numbers in which no digit is repeated. There 


5 
are ()) - 8-7-6 numbers in which only one specified digit appears twice, so 


5 
there are (3) -8- 7-6 numbers in which only a single digit appears twice. 
5! 


There are 7 - oT numbers in which two specified digits appear twice, so there 
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5! 
are ()) 7 - == numbers in which two digits appear twice. Thus, the answer is 


2!2! 
9* B87 ~6¢5+9 : 8-7*6+ : 7 ak 
2 2 212! 


4.11; 
a. We can regard this as a seven-stage experiment. First choose the 6 
married couples that have a representative in the group, and then 
select one of the members of each of these couples. By the 


10 
generalized basic principle of counting, there are ( : jes different 


choices. 
b. First select the 6 married couples that have a representative in the 
group, and then select the 3 of those couples that are to contribute a 


10\/6 10! 
man. Hence, there are = —— different choices. Another 
6 /\3 41313! 


way to solve this is to first select 3 men and then select 3 women not 
related to the selected men. This shows that there are 


LY a ere 
3 3 = 313141 merent Cnoices. 


8\ (7 8\ (7 
1.12. (3)(2) + (0) = 3430. The first term gives the number of 


committees that have 3 women and 3 men; the second gives the number that 
have 4 women and 2 men. 
1.13. | (number of solutions of x, + --- +x. = 4) (number of solutions of 


8\/9\/10 
X1 +++ x5 = 5) (number of solutions of x, +++ + x5 = 6) = elt 4 } 


1.14. ince there are é :) positive vectors whose sum is j, there must be 


k 
j—1 pi 
> ( 1) such vectors. But ( 1) is the number of subsets of size n 
n — 
from the set of numbers {1, ...,k} in which j is the largest element in the 


k 
j—1 
subset. Consequently, pa ( i) is just the total number of subsets of size 
n — 


jun 
k 
n from a set of size k, showing that the preceding answer is equal to ( ) 
n 
1.15. Let us first determine the number of different results in which k 


n 
people pass. Because there are (*) different groups of size k and k! possible 
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n 
orderings of their scores, it follows that there are (je possible results in 


n 
n 
which k people pass. Consequently, there are pg (je possible results. 
k=0 


20 
1.16. | The number of subsets of size 4 is ( ‘ = 4845. Because the 


15 
number of these that contain none of the first five elements is ( A = 1365, 
the number that contain at least one is 3480. Another way to solve this 
5\/ 14 
problem is to note that there are ae ) that contain exactly i of the first 
i —i 


five elements and sum this for i = 1, 2, 3, 4. 


1.17. | Multiplying both sides by 2, we must show that 
n(n—1)=k(k-—1)+ 2k(n—k)+(n—-—k)(n-—k-1) 


This follows because the right side is equal to 
k?(1-24+1)+k(-—1+2n—n—n4+1)+n(n-1) 


For a combinatorial argument, consider a group of n items and a subgroup of 
k 
k of the n items. Then ()) is the number of subsets of size 2 that contain 2 


items from the subgroup of size k, k(n — k) is the number that contain 1 item 


n 
from the subgroup, and ( is the number that contain 0 items from the 


subgroup. Adding these terms gives the total number of subgroups of size 2, 


n 
namely, (7) 


1.18. | There are 3 choices that can be made from families consisting of a 
single parent and 1 child; there are 3 - 1 - 2 = 6 choices that can be made from 
families consisting of a single parent and 2 children; there are 5-2-1= 10 
choices that can be made from families consisting of 2 parents and a single 
child; there are 7 - 2 - 2 = 28 choices that can be made from families 
consisting of 2 parents and 2 children; there are 6 - 2 - 3 = 36 choices that can 
be made from families consisting of 2 parents and 3 children. Hence, there 
are 83 possible choices. 

1.19. First choose the 3 positions for the digits, and then put in the letters 


8 
and digits. Thus, there are (3) -26-25-24-23-22-10-9-8 different plates. 


If the digits must be consecutive, then there are 6 possible positions for the 
digits, showing that there are now 6- 26-25.-24-23-22-10-9-8 different 
plates. 
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1.20. 


a. Follows since is the number of n letter permutations of the 


Xzlex,! 


- 
values 1, ...,r in which i appears x; times, » xj =n. 
U 


i=1 
b De. a ay FD 


bX = nXy!: “Xy! 


1:21. a-or=1-(")+(0) .. +(-1) ac ) giving that 


(")-(5)+ . +(-9"(") =1 


Chapter 2 


2.1. 
.2:3:4=24 
.2:3=6 
.3:4=1 


N 


B = {(c, pasta, i), (c, rice, i), (c, potatoes, i)} 


® 2000 
oOo rR WNP 


f ABC = {(c, rice, i)} 


2.2. Let A be the event that a suit is purchased, B be the event that 
a shirt is purchased, and C be the event that a tie is purchased. Then 
P(AUBUC) = .224+ .304+.28—.11—.14—-—.104+ .06 =.51 


a.1—.51=.49 
b. The probability that two or more items are purchased is 
P(ABUACUBC) = .11+.14+4.10 —.06—.06 


— 06 + .06 = .23 


Hence, the probability that exactly 1 item is purchased is 
51 — .23 = .28. 


2.3. By symmetry, the 14th card is equally likely to be any of the 52 
cards; thus, the probability is 4/52. A more formal argument is to count 
the number of the 52! outcomes for which the 14th card is an ace. This 


yields 
4-51-50:-2-1 4 


pe G2 «52 


Letting A be the event that the first ace occurs on the 14th card, we 
have 


he 48 -47--36-4 re 
( )= 53-51-40 -39 =* 


2.4. | Let D denote the event that the minimum temperature is 70 


degrees. Then 
P(A UB) = P(A) + P(B) — P(AB) = .7 — P(AB) 


P(C UD) = P(C) + P(D) — P(CD) = .2 + P(D) — P(DC) 


Since AU B = CUD and AB = CD, subtracting one of the preceding 
equations from the other yields 


0 =.5—P(D) 
or P(D) =.5. 
2.5. 
52:48-44-°40 
ope se0edo 
. 52°39*26513 _ ee 
"52-51-50-49 ~ 


2.6. Let R be the event that both balls are red, and let B be the event 
that both are black. Then 


P(RUB) = P(R) + P(B) = 2 tt 2 = 1/2 
(RUB) = P(R) + P(B) = + = 1/ 
2.7. 
: 1.3x10°° 
a. = 
40 
8 
(; (7) 
7W\ 4 
b= 4S 33 10 -* 
40 
8 
()(2) 
6/\ 2 -8 -6 _ —4 
c, A +13X10 8 +3.3x10 6 = 18x 10 
2.8. 
Bs An Hx 
a. = 1439 
14 
:) 
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b. 14 0360 
() 
() 
C. zs = .0699 
14 
(") 
2.9. LetS=_ U _ fe and consider the experiment of randomly 


i= 
choosing an Secea of S. Then P(A) = N(A)/N(S), and the results 
follow from Propositions 4.3 and 4.4. 

2.10. | Since there are 5! = 120 outcomes in which the position of 
horse number 1 is specified, it follows that N(A) = 360. Similarly, 
N(B) = 120, and N(AB) = 2-4! = 48. Hence, from Self-Test Problem 
2.9  ,we obtain N(A U B) = 432. 

2.11. | One way to solve this problem is to start with the 
complementary probability that at least one suit does not appear. Let 
A;, i = 1, 2,3, 4, be the event that no cards from suit i appear. Then 


(bai) 27 ». » P(A;A;) 


j i:i<fj 


+++» — P(A,A2A3A4) 


oy As) 
() G) &) 


The desired probability is then 1 minus the preceding. Another way to 


= 4—— 


solve is to let A be the event that all 4 suits are represented, and then 
use 
P(A) = P(n,n, n,n, 0) + P(n, n, n, 0, n) + P(n, n, 0, n, n) 


+P(n, 0, n, n,n) 
where P (n,n, n, 0, n), for instance, is the probability that the first card is 


from a new suit, the second is from a new suit, the third is from a new 
suit, the fourth is from an old suit (that is, one which has already 
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appeared) and the fifth is from a new suit. This gives 
52: 39-26-13 -48+4+ 52-39: 26-36-13 
52-51-50. 49 - 48 
52: 39-24-26-134+ 52-12-39. 26-13 
e 52:-51-50-49- 48 
52 - 39 - 26- 13(48 + 36 + 24 4+ 12) 
52-51-50. 49 - 48 


P(A) = 


.2637 


2.12. There are (10)!/2° different divisions of the 10 players into a 
first roommate pair, a second roommate pair, and so on. Hence, there 


6\ (4 
are (10)!/(5!2°) divisions into 5 roommate pairs. There are ()() 


ways of choosing the frontcourt and backcourt players to be in the 
mixed roommate pairs and then 2 ways of pairing them up. As there is 
then 1 way to pair up the remaining two backcourt players and 
4!/(2!27) = 3 ways of making two roommate pairs from the remaining 
four frontcourt players, the desired probability is 


C)(jow 


=— = .5714 
(10)!/(5!2°) 


P{2 mixed pairs} = 


2.13. Let R denote the event that letter R is repeated; similarly, 
define the events E and V. Then 

P{same letter} = P(R) + P(E) + P(V) 
21 re gl 11 3 
78°78" 78 28 
i 


Cc 
2.14. Let B, = A,,B; = Ai( 4,) i> 1. Then 


J 


v 
P( iv A:) = P( iv B,) 
i=1 i=1 


IA I 
i 
a eS 
= = 
SS A 


where the final equality uses the fact that the B; are mutually exclusive. 
The inequality then follows, since B; c Aj. 
2.15. 


741 of 848 


~ 
—— 
I D8 
je 

= 
Na” 

II 

ie 
a 
— 
u 58 
= 
Nee 


IV 
ll ie 
~ 
— 

z 
ae 


2.16. | The number of partitions for which {1} is a subset is equal to 
the number of partitions of the remaining n — 1 elements into k — 1 
nonempty subsets, namely, T,—,(n — 1). Because there are T;,(n — 1) 
partitions of {2, ...,n — 1} into k nonempty subsets and then a choice of 
k of them in which to place element 1, it follows that there are 
kT;,(n — 1) partitions for which {1} is not a subset. Hence, the result 
follows. 
2.17. Let R, W, B denote, respectively, the events that there are no 
red, no white, and no blue balls chosen. Then 

P(RUWUB) = P(R)+P(W) + P(B) — P(RW) 


—P(RB) — P(WB) + P(RWB) 


i () 1B _ |) 


Q 

oS 
N 
Ne) 
wo 
w 


Thus, the probability that all colors appear in the chosen subset is 
approximately 1 — 0.2933 = 0.7067. 
2.18. 
8:-7:6°5:-4 _ 2 
“Age iesiaieeds 224 
b. Because there are 9 nonblue balls, the probability is 
9-8:-7-6°5 _ 9 
17-16-15-14-13 442° 
c. Because there are 3! possible orderings of the different colors 
and all possibilities for the final 3 balls are equally likely, the 
3!-4°-8-5 4 


probability IS 17-16-15 = 17° 
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d. The probability that the red balls are in a specified 4 spots is 
4-3-2-1 
17-16-15-14° 
red balls where they are all together, the probability is 
14-4-3-2-1 1 
17-16-15-14 170° 


Because there are 14 possible locations of the 


2.19. 
a. The probability that the 10 cards consist of 4 spades, 3 hearts, 2 


13\/13\/13\/13 
C)s)2)) 
52 
(1) 
are 4! possible choices of the suits to have 4,3,2, and 1 cards, 
respectively, it follows that the probability is 


CN) )G) 
(1) 


4 
b. Because there are ()) = 6 choices of the two suits that are to 


diamonds, and 1 club is . Because there 


have 3 cards and then 2 choices for the suit to have 4 cards, the 
13\/13\/13 
12 
alae 
52 , 
10 


2.20. All the red balls are removed before all the blue ones if and 


probability is 


only if the very last ball removed is blue. Because all 30 balls are 
equally likely to be the last ball removed, the probability is 10/30. 


Chapter 3 


S.A. 
35 

a. P(no aces) = ( 

4 


b. 1 — P(no aces) — 


743 of 848 


(") 


3.2. Let L; denote the event that the life of the battery is greater than 
10,000 x imiles. 


a. P(L2|L1) = P(LyL2)/P(L1) = P(L2)/P(L1) = 1/2 
b. P(L3|Ly) = P(LiL3)/P(L1) = P(L3)/P(L1) = 1/8 


c. P(iaces) = 


3.3. | Put 1 white and 0 black balls in urn one, and the remaining 9 white and 
10 black balls in urn two. 

3.4. Let T be the event that the transferred ball is white, and let W be the 
event that a white ball is drawn from urn B. Then 


awh. = P(W|T)P(T) 
(IW) = P(W|T)P(T) + P(WI|T) PCT) 
(2/7)(2/3) = 4/5 
(2/7)(2/3) + (1/7)Q/3) 
3.5. 
a. P(E|EU F) = EES) = a, since E(E U F) = E and 


P(EUF) P(E) +P(F) 
P(E U F) = P(E) + P(F) because E and F are mutually exclusive. 


b. P(E,| uz, #,) = Pus Bd) _ _ PE) 


P( U2, E;) > P(E;) 


3.6. Let B; denote the event that ball i is black, and let R; = B;. Then 
P(R2| By)P(B1) 
P(R2|B,)P(Bi) + P(R2|R1)P(R1) 
[r/[b+r+o][b/(+7)] 
[r/(b+r+c)][b/(b+r)]+[r+c)/(b+r+c)][r/(b+7r)] 
b 
~ b+r+c 


P(B,|R2) = 


3.7. Let B denote the event that both cards are aces. 
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P{B, yes to ace of spades} 


P{B|yestoace of spades} = “Piyes to ace of spades} 


1\/3 1\/51 
, (la) WC) 
= “752\ / (52\_ 
(2) () 
3/51 
b. Since the second card is equally likely to be any of the remaining 51, of 
which 3 are aces, we see that the answer in this situation is also 3/51. 
c. Because we can always interchange which card is considered first and 
which is considered second, the result should be the same as in part 


(b). A more formal argument is as follows: 
P{B, second is ace} 


P{B|second is ace} Piccuna ae 


P(B) 
P(B) + P{first is not ace, second is ace} 
(4/52)(3/51) 
(4/52)(3/51) + (48/52)(4/51) 
3/51 
P(B) 
P{at least one} 
d. _ (4/52)(/51) 
1 — (48/52)(47/51) 

1/33 


P{B|at least one} 


P(H|E) P(HE)  P(H)P(E|#) 

P(G|E) P(GE) P(G)P(E|G) 
Hypothesis H is 1.5 times as likely. 
3.9. | Let A denote the event that the plant is alive and let W be the event 
that it was watered. 

P(A) = P(A|W)P(W) + P(A|W°)P(W*) 

< = (.85) (.9) + (.2) (1) =.785 

P(AS|W*)P(W*) 


PW|A = : 
b. P(A’) 
(.8)(.1) _ 16 
215 43 


3.10. 
a. Let R be the event that at least one red ball is chosen. Then 
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(“) 


P(R) = 1—P(R®) =1--—~ 


(.) 


b. Let G2 be the event there are exactly 2 green balls chosen. Working 


ol, 


with the reduced sample space yields 
10\/12 
ees, 
> aay 
(6) 


Let W be the event that the battery works, and let C and D denote the 


P(G,|R°) = 


events that the battery is a type C and that it is a type D battery, respectively. 
a. P(W) = P(W|C)P(C) + P(W|D)P(D) = .7(8/14) + .4(6/14) = 4/7 


b. P(C|W*) = 


3.12. 


P(CW) — P(W*|C)P(C) _ .3(8/14) | 
P(W‘) 3/7 -- aay 


Let L; be the event that Maria likes book i,i = 1,2. Then 
P(L{L2) _ P(LyL2) 
PILLS) ii 


P(L2|L7) = 


Using that L, is the union of the mutually exclusive events L,L, and L{L,, we 


see that 
S= Ps) = PUyls) +P )H= A+ PAL) 
Thus, 
1 
Ps |L4)= a = .25 
3.13. 


. This is the probability that the last ball removed is blue. Because each 


of the 30 balls is equally likely to be the last one removed, the 
probability is 1/3. 


. This is the probability that the last red or blue ball to be removed is a 


blue ball. Because it is equally likely to be any of the 30 red or blue 
balls, the probability that it is blue is 1/3. 


. Let B,, Rz, G3 denote, respectively, the events that the first color 


removed is blue, the second is red, and the third is green. Then 
P(B,R2G3) = P(G3)P(R2|G3)P(B,|R2G3) = ee 
(B,R2G3) = P(G3)P(R2|G3 ieee eee a | | 
where P(G3) is just the probability that the very last ball is green and 
P(R,|G3) is computed by noting that given that the last ball is green, 
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each of the 20 red and 10 blue balls is equally likely to be the last of 
that group to be removed, so the probability that it is one of the red 
balls is 20/30. (Of course, P(B,|R,G3) = 1.) 

20 8 8 64 


d. P(B,) = P(B,G2R3) + P(B,R2G3) = 38 18 + 57 = T71 
3.14. | Let H be the event that the coin lands heads, let T;, be the event that 
B is told that the coin landed heads, let F be the event that A forgets the result 
of the toss, and let C be the event that B is told the correct result. Then 

P(Tn) = P(TrlF)P(F) + PO aIFOPE) 


a. = (.5)(.4) + P(H)(.6) 
= 68 
: P(C) = P(C\|F)P(F) + P(C|F)P(F) 
, = (5)(.4)+1(.6) =.80 
P(HT 
c. P(H|T,) = ee 
Now, 
P(HT,) = P(HT,|F)P(F) + P(HT,|F9P(F) 


P(H|F)P(T,|HF)P(F) + P(H)P(F*) 
(.8) (.5) (4) + (8) (.6) = .64 


giving the result P(H|T),) = .64/.68 = 16/17. 


3.15. | Since the black rat has a brown sibling, we can conclude that both of 
its parents have one black and one brown gene. 
P(2) 1/4 1 
a. P(2 black | atleast one) = Paleastoi) = 3/4 =o 
b. Let F be the event that all 5 offspring are black, let B, be the event that 
the black rat has 2 black genes, and let B, be the event that it has 1 
black and 1 brown gene. Then 
P(F|B2)P(B2) 
P(F|B2)P(Bz) + P(F|B1)P(B1) 


(1)(1/3) 16 


(14/3) + (4/2)5(2/3) 17 


P(B2|F) = 


3.16. | Let F be the event that a current flows from A to B, and let C; be the 
event that relay i closes. Then 


P(F) = P(FICy)p, + P(FICT)(1 — p,) 


Now, 
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P(F|C;) P(C,UC2Cs5 UC3Cs5) 
= Py, a P2Ps + P3P5 — P4P2Ps 


— P4P3Ps — P2P3Ps5 + P4P2PsP3 


Also, 


P(F|Cy) P(C,Cs5 UC2C3C,4) 


P2Ps + P2P3P4 — P2P3P4Ps 


Hence, for part (a), we obtain 
PO) = Pp, +P2Pet Pie —PiPoP. 


=P DsVe— PP Pe ¥ DyPaPed.) 
(1p, 6,0. + 0.0) — PPP.) 


For part (b), let q, = 1 — p,. Then 
P(C3|F) = P(FIC3)P(C3)/P(F) 


p,l1 — P(CiCz U C4C5)]/P(F) 


Pa(1— 4495 - 9495 + 94959495) / PCF) 


3.17. | Let A be the event that component 1 is working, and let F be the 


event that the system functions. 
P(AF) P(A 1/2 2 
a. P(ajry = PAD _ PA __1/2__2 
P(F) P(F) 1-(1/2)* 3 
where P(F) was computed by noting that it is equal to 1 minus the 
probability that components 1 and 2 are both failed. 


P(AF) P(FIA)P(A)_—s (3/4)(1/2) | 33 


b. P(A|F) = 


P(F) = P(F) Ss (/2)2 +3(4/2)3 4 
where P(F) was computed by noting that it is equal to the probability 
that all 3 components work plus the three probabilities relating to 
exactly 2 of the components working. 


3.18. If we assume that the outcomes of the successive spins are 
independent, then the conditional probability of the next outcome is 
unchanged by the result that the previous 10 spins landed on black. 


3.19. Condition on the outcome of the initial tosses: 
PAodd) = Py —Pi)0— Pay Py PaPs 


+ P,P,P3P(A odd) 
+ (1 — P;) (1 — Pz) (1 — P3)P(A odd) 


so, 
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P(Aodd) = L106 — P2) 0 ~ Ps) + CO ~ PiPoPs 
P, + P,+P3— P,P, —P,P3—P2P3 


3.20. | Let A and B be the events that the first trial is larger and that the 
second is larger, respectively. Also, let E be the event that the results of the 


trials are equal. Then 
1 = P(A) + P(B) + P(E) 


But, by symmetry, P(A) = P(B): thus, 


Another way of solving the problem is to note that 


P(B) > > P{first trial results in i, second trial results in j} 


ae 


dd, Pe 


ceo i 


To see that the two expressions derived for P(B) are equal, observe that 


= Yad 
- DYpe, 
- Sot+y. De 


i j#i 


DP! +2). ». DiP; 


ij>i 


3.21. Let E = {Agets more heads than B}; then 
P(E) = P(E|Aleads after both flip n) P(A leads after both flip n) 


+ P(Eleven after both flip n) P(even after both flip n) 
+ P(E|B leads after both flip n) P(B leads after both flip n) 


1 
= P(Aleads) + 5 P(even) 


Now, by symmetry, 
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P(Aleads) = P(Bleads) 
1 — P(even) 
2 
Hence, 
P(E) = : 
(E)=5 
3.22. 


a. Not true: In rolling 2 dice, let E = {sum is 7}, 
F = {1st die does not land on 4}, and G = {2nd die does not land on 3}. 


Then 
P{7, 4,3 5/36 
P(E|FUG) = oe = a = 5/35 # P(E) 
P(E(FUG)) = P(EFU EG) 
b = P(EF) + P(EG) since EFG = 9 
| = P(E)[P(F) + P(@)] 
= P(E)P(F UG) since FG = @ 
P(EFG 
P(G|EF) = a 
& = a since E is independent of FG 
P(E)P(F)P(G 
= aE by independence 
= P(G). 
3.23. 


a. necessarily false; if they were mutually exclusive, then we would have 
0 = P(AB) # P(A)P(B) 


b. necessarily false; if they were independent, then we would have 
P(AB) = P(A)P(B) > 0 


c. necessarily false; if they were mutually exclusive, then we would have 
P(A UB) = P(A) + P(B) = 1.2 


d. possibly true 


3.24. The probabilities in parts (a), (b), and (c) are .5,(.8)° = .512, and 
(.9)’ = .4783, respectively. 
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3.25. Let D,,i = 1,2, denote the event that radio i is defective. Also, let A 
and B be the events that the radios were produced at factory A and at factory 
B, respectively. Then 
P(D,D2) 

P(D;) 
P(D,D2|A)P(A) + P(D,D2|B)P(B) 

P(D,|A)P(A) + P(D,|B)P(B) 

(.05)7(1/2) + (.01)7(1/2) 
-(.05)(1/2) + (01)(1/2) 
13/300 


P(D,|D1) = 


3.26. | We are given that P(AB) = P(B) and must show that this implies that 
P(B‘A‘) = P(A‘). One way is as follows: 
P(BSA‘) P((AUB)*‘) 
= 1—P(AUB) 
= 1-P(A) —P(B) + P(AB) 
= 1-—P(A) 


= P(A’) 


3.27. |The result is true for n = 0. With A; denoting the event that there are i 
red balls in the urn after stage n, assume that 


1 
LA =a i=1,...,n+1 


Now let B;, j = 1,...,2 + 2, denote the event that there are j red balls in the 


urn after stage n + 1. Then 
n+1 


PB) = D> P(BjLA)PCA) 
=1 
1 n+1 
~ nti y P(B,|Ai) 
i=1 


1 
= 57 |P(BilAs-1) + P(By1Ai)] 


~ 


Because there are n + 2 balls in the urn after stage n, it follows that 
P(B,|A;—1) is the probability that a red ball is chosen when j — 1 of the n + 2 
balls in the urn are red and P(B; | A;) is the probability that a red ball is not 
chosen when j of the n + 2 balls in the urn are red. Consequently, 

n+2-—-j 


j-1 
PUBApa)=la> PB) =—a5 


Substituting these results into the equation for P(B;) gives 
j-1 n4+2- i 1 


eo 
EXE) | n+2 n+2 


This completes the induction proof. 
3.28. If A; is the event that player i receives an ace, then 
2n—2 
( n 1n-1 3n—-1 
BOO ae = 5 ed AD 
(n) 


By arbitrarily numbering the aces and noting that the player who does not 
receive ace number one will receive n of the remaining 2n — 1 cards, we see 
that 


P(A, Az) = = 


Therefore, 

P(A,A,) n-1 
Cc — = = _ _— 
We may regard the card division outcome as the result of two trials, where 
trial i,i = 1,2, is said to be a success if ace number i goes to the first player. 
Because the locations of the two aces become independent as n goes to 
infinity, with each one being equally likely to be given to either player, it follows 
that the trials become independent, each being a success with probability 1/2. 
Hence, in the limiting case where n — 0, the problem becomes one of 
determining the conditional probability that two heads result, given that at 


3n-—1 


least one does, when two fair coins are flipped. Because converges to 


1/3, the answer agrees with that of Example 2b. 
3.29. 
a. For any permutation i,,...,i, of 1,2,...,n, the probability that the 


n 
successive types collected is i,,...,i, is PD, = [| D;. 
w=1 


n 


Consequently, the desired probability is ntl | ie 


i= 
b. For i,,...,i, all distinct, 


n—k\" 
P(E Ey) = ( 


n 


which follows because there are no coupons of types i,,...,i, when 
each of the n independent selections is one of the other n — k types. It 
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now follows by the inclusion—exclusion identity that 
“ n\(n—k\" 
kt+1 ss 
p(uR B= Year (t 2) 
k =1 


Because 1 — P( Uj, E;) is the probability that one of each type is 


! 
obtained, by part (a) it is equal to =. Substituting this into the 


preceding equation gives 


or 


nl =n" — > c= y(t ~K)” 


k=1 


or 
nl = ». (-n'(Fa-1" 
k =0 


P(E|EUF) = P(E|F(EUF))P(F|EU F) 


3.30. 
+P(E|F°(E U F))P(F‘|E UF) 
Using 
F(EUF)=F and F(EUF) =F‘E 
gives 
P(B|IEUF) = P(E|F)P(FIEU F) + P(E|EFP(F‘|E U F) 
= P(E|F)P(F|E UF) + P(F‘|E UF) 
> P(E|F)P(F|E U F) + P(E|F)P(F‘|E U F) 
= P(E|F) 
3.315 
a. 2/5 
b. 5/6 
3.32. 
a. 1/7 
b. 1/6 


3.33. 
P(EFG‘) 
P(FG‘) 
P(EF) — P(EFG) 
P(F) — P(FG) 
P(E)P(F) — P(E)P(F)P(G) 
P(F) — P(F)P(G) 
= P(E) 


P(E|FG‘) 


The second equality in the preceding used that EF = EFG U EFG‘. 
3.34. Let W, be the event that player 1 wins the contest. Letting O be the 
event that player 1 does not play in round 1, we obtain by conditioning on 
whether or not O occurs, that 
P(W,) = P(W,|0)P(O) + P(W,|0°)P(O*) 
- run; 3 
where the preceding used that if O° occurs then 1 would have to beat both 2 
and 3 to win the tournament. To compute P(W, | 0), condition on which of 2 or 
3 wins the first game. Letting B; be the event that i wins the first game 
P(W,|0) = P(W,|0, Bz)P(B2|0) + P(W,|0, Bs)P(B3|0) 


= ~-+- =17/60 


Hence, P(W,) = 3/20. Also, 
P(W,|0)P(O) _ (17/60)(1/3) 


P(O|W,) = —>w) > 3720 17/27 
P(all whit 
3.35. P(all white | same) = oe 
Now, 
() a) * (a) * (a) +04) 
+({  }+ + 
P(all white) = Bia P(same) = z : z 
22\ 22 
4 4 
giving that 


P(all white | same ) = ———->_.—_~_ = => 
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3.36. Let Bz be the probability that 3 beats 4. Because 1 beats 2 with 
probability 1/3, 
P(1) = P(1|B3)P(B3) + P(1|B3)P(B3) = (1/3) /4)(3/7) 


+ (1/3)(1/5)(4/7) = 31/420 


3.37. 
a. Condition on who wins the first game to obtain: 
P(W3;) = P(W3|1 wins)(1/3) + P(W3|2 wins) (2/3) 


a/3ya/4) | [| 5 + eae) |] <5 


i=4 i=4 


n 


13 3 
20 ;23 


i=4 


b. Condition on the opponent of player 4. If 0; is the event that i is the 
opponent, i = 1, 2,3, then 


ci, 
ee ae 

4 
PIO) = se > Ge 
pie 4 1 4 13 
(03) = 12 15 20 


Hence, 
3 


4 
PW.) =) P(W4I0,)P(O,) = = 


i=1 


4 413 _ 194 
15° 720 315 


Chapter 4 


4.1. Since the probabilities sum to 1, we must have 

4P{X = 3} +.5 = 1, implying that 

P{X = 0} = .375,P{X = 3} = .125. Hence, 

E[X] = 1(.3) + 2(.2) + 3(.125) = 1.075. 

4.2. The relationship implies that p, = Chal = 1,2, where 

p, = P{X = i}. Because these probabilities sum to 1, it follows that 


1 
pret) =1> P= aaa 
Hence, 
Bie o3y.2 c+ 2c? 
[4] = py Po 1l+ct+c? 
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4.3. Let X be the number of flips. Then the probability mass function of 
X is 
p,=p*+(1-p)’*, p, =1-p, = 2p(1-p) 


Hence, 
E[X] = 2p, + 3p, = 2p, + 3(1—-p,) =3-p*-(1-p)’ 


4.4. | The probability that a randomly chosen family will have i children 
is n;/m. Thus, 


r 


E[X] = 3 in; /m 


i=1 


Also, since there are in; children in families having i children, it follows 
that the probability that a randomly chosen child is from a family with i 


. 
children is in; / ». in;. Therefore, 


i=1 


Thus, we must show that 


Tr Tr 
>, @n; >, in 
i=1 > L=1 

Tr ~~ Tr 
>, in > 
i=1 i=1 


or, equivalently, that 


r 


- 
#2 
j=1 i=1 


or, equivalently, that 


But, for a fixed pair i, j, the coefficient of nn; in the left-side summation of 
the preceding inequality is i? + j7, whereas its coefficient in the right-hand 
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summation is 2ij. Hence, it suffices to show that 
i? + j? > 2ij 


which follows because (i — j)? > 0. 
4.5. Let p = P{X = 1}. Then E[X] = p and Var(X) = p(1 — p), so 
p = 3p(1 — p) 


implying that p = 2/3. Hence, P{X = 0} = 1/3. 
4.6. If you wager x on a bet that wins the amount wagered with 
probability p and loses that amount with probability 1 — p, then your 


expected winnings are 
xp —x(1—p) = (2p — 1)x 


which is positive (and increasing in x) if and only if p > 1/2. Thus, if 

p < 1/2, one maximizes one’s expected return by wagering 0, and if 

p > 1/2, one maximizes one’s expected return by wagering the maximal 
possible bet. Therefore, if the information is that the .6 coin was chosen, 
then you should bet 10; if the information is that the .3 coin was chosen, 
then you should bet 0. Hence, your expected payoff is 


1 1 
5(1.2-—1)10+-0-C=1-C 
2! ) 5 


Since your expected payoff is 0 without the information (because in this 
1 1 
case the probability of winning is 766) + 763) < 1/2), it follows that if the 


information costs less than 1, then it pays to purchase it. 
4.7. 
a. If you turn over the red paper and observe the value x, then your 


expected return if you switch to the blue paper is 
2x(1/2) + x/2(1/2) =5x/4>x 


Thus, it would always be better to switch. 

b. Suppose the philanthropist writes the amount x on the red paper. 
Then the amount on the blue paper is either 2x or x /2. Note that if 
x/2 = y, then the amount on the blue paper will be at least y and 
will thus be accepted. Hence, in this case, the reward is equally 


likely to be either 2x or x/2, so 
E[Ry(x)] =5x/4, ifx/2>y 


If x/2 <y < 2x, then the blue paper will be accepted if its value is 


2x and rejected if it is x/2. Therefore, 
E[R,(x)] = 2x(1/2) + x(1/2) = 3x/2, ifx/2<y< 2x 
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Finally, if 2x < y, then the blue paper will be rejected. Hence, in 


this case, the reward is x, so 
Ry(x) =x, if2x<y 


That is, we have shown that when the amount x is written on the 
red paper, the expected return under the y-policy is 
x ifx<y/2 


E[Ry(x)] = 43x/2 ify/2<x<2y 
5x/4 ifx >2y 


4.8. | Suppose that n independent trials, each of which results in a 
success with probability p, are performed. Then the number of successes 
will be less than or equal to i if and only if the number of failures is 
greater than or equal to n — i. But since each trial is a failure with 
probability 1 — p, it follows that the number of failures is a binomial 
random variable with parameters n and 1 — p. Hence, 

P{Bin (n,p) <i} = P{Bin(n,1—p) =>n- i} 


= 1-—P{Bin(n,1—p) <n-i-1} 


The final equality follows from the fact that the probability that the number 
of failures is greater than or equal to n — iis 1 minus the probability that it 
is less than n — i. 

4.9. Since E[X] = np, Var(X) = np(1 — p), we are given that 

np = 6,np(1 — p) = 2.4. Thus, 1 — p = .4, or p = .6,n = 10. Hence, 


P{X = 5} = (‘s Joss 


4.10. Let X;,i = 1,...,m, denote the number on the ith ball drawn. 
Then 
Pixek) = PIX pe Xo aK. ok ye 


= P{X, < k}P{X, <k}-P{Xm < k} 


ll 
i. 
a|= 
WK 
3 


Therefore, 


pir =W) = Pir st)-ersk-1)=(*) -(*)” 


4.11. 
a. Given that A wins the first game, it will win the series if, from then 


on, it wins 2 games before team B wins 3 games. Thus, 


4 
4\ . _ 
P{A wins |A wins first} = ». (j)p'a- ' 
i 


i=2 


P{A wins | A wins first}P{A wins first} 
P{A wins} 


()) pitt. —p)*' 
5\ _ 
(j)r'a —p)' 
3 


4.12. To obtain the solution, condition on whether the team wins this 


4 4 
4 i 4-i 4 i 4-i 
5), ee (.6) +5) (‘Jen (3) 


i= 3 i= 3 


P{A wins first | A wins} 


2 


weekend: 


4.13. Let C be the event that the jury makes the correct decision, and 
let F be the event that four of the judges agreed. Then 


7 
‘i 2 
P= » (Jen'ea” | 


i=4 


Also, 
P(CF) 


p(c|r) = * 


(Fenton 
7 ree ae 37 a\4 
(ZFentca*+(Jen%a 


= 7 


4.14. | Assuming that the number of hurricanes can be approximated by 


a Poisson random variable, we obtain the solution 
3 


». e52(5,2)' 711 


i=0 
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(ee) 


E[Y] = > iP{X = i}/P{X > 0} 
i=1 
se = EIX|/P{X > 0} 
_ A 
~ J-e4 
4.16. 
a. 1/n 


b. Let D be the event that girl i and girl j choose different boys. Then 
P(G;G;) = P(G,G,;|D)P(D) + P(G;G,;|D°)P(D) 


(1/n)*(1 —1/n) 


n-1 


n3 


Therefore, 


n—-1 
P(GiIG;) = => 


c.d. Because, when n is large, P(G; | G;) is small and nearly equal to 
P(G;), it follows from the Poisson paradigm that the number of 
couples is approximately Poisson distributed with mean 


n 
> P(G;) = 1. Hence, Py = e~1 and P, = e~1/k! 
i=1 
e. To determine the probability that a given set of k girls all are 
coupled, condition on whether or not D occurs, where D is the 
event that they all choose different boys. This gives 
P(Gi,Gi,) = P(Gi,:~G;,|D)P(D) 
+P(G;,-"-G;,|D°)P(D) 
= P(G;,--G;,|D)P(D) 
n(n —1)---(n—k +1) 


k 

= (1/n) ne 
n! 

~~ (n—k)!n2* 

Therefore, 
n 
> PGiyGi,) = (7 )eei6i) 
y4<.. < ik 


nin! 
(n—k)!(n—k)!k!n2k 


and the inclusion—exclusion identity yields 
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n 
nin! 


1— Py = P( ULL, G) = > (- 1" an bikin 


k=1 


4.17. 
a. Because woman i is equally likely to be paired with any of the 


remaining 2n — 1 people, P(W;) = Ina 


b. Because, conditional on W,, woman i is equally likely to be paired 


with any of 2n — 3 people, P(W;|W;) = a 


c. When n is large, the number of wives paired with their husbands 
will approximately be Poisson with mean 


Ht n 
P(W;) = rr i 1/2. Therefore, the probability that there 
i=1 = 
is no such pairing is approximately e~ */?. 


d. It reduces to the match problem. 


4.18. 
8 3 5 _ 8 4 5 
a. { , } (9/19)*(10/19)°(9/19) = ( , ) (9/19)*(0/19) 


b. If W is her final winnings and X is the number of bets she makes, 
then, since she would have won 4 bets and lost X — 4 bets, it 


follows that 
W = 20-—5(X—4) = 40-5X 


Hence, 
E[W] = 40 — 5E[X] = 40 — 5[4/(9/19)] = — 20/9 


4.19. | The probability that a round does not result in an “odd person” is 
equal to 1/4, the probability that all three coins land on the same side. 

a. (1/4)7(3/4) = 3/64 

b. (1/4)* = 1/256 


4.20. Letq=1-—p.Then 
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E[1/X] 


Il [\“8 


p 
= ——log(p 
7 (p) 


a—b 
1 — p, it follows that it is a Bernoulli random variable with parameter p. 


4.21. Since 


will equal 1 with probability p or O with probability 


Because the variance of such a Bernoulli random variable is p(1 — p), we 
have 


p(1 — p) 


H 
s 
y 
, ea 

x 
I} | 
>> 
we 

ll 

— 
> 
A 
nN 
= 
et 
— 
x 
| 
> 
— 


| 
s 

| 
—D 
x 
Y 


oo 
~ (a—by’ 
Hence, 
Var(X) = (a— b)p(1 — p) 


4.22. Let X denote the number of games that you play and Y the 
number of games that you lose. 
a. After your fourth game, you will continue to play until you lose. 
Therefore, X — 4 is a geometric random variable with parameter 
1—p,so 


B[X| = Bla + (X~4)] = 4 + EX 4] =4 + 


b. If we let Z denote the number of losses you have in the first 4 
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games, then Z is a binomial random variable with parameters 4 


and 1 — p. Because Y = Z + 1, we have 
E[Y] =£[Z2+1] = £[Z]+1=40—-p)+1 


4.23. A total of n white balls will be withdrawn before a total of m black 
balls if and only if there are at least n white balls in the first n + m— 1 
withdrawals. (Compare with the problem of the points, Example 4j of 
Chapter 3.) With X equal to the number of white balls among the first 
n +m -— 1 balls withdrawn, X is a hypergeometric random variable, and it 


follows that 
n+m-1 


P{X =n} 


ll 
M 
x 
>< 
ll 
os 


4.24. | Because each ball independently goes into urn i with the same 
probability p,, it follows that X; is a binomial random variable with 
parameters n = 10,p = p,. 

First note that X; + X; is the number of balls that go into either urn i or urn 
j. Then, because each of the 10 balls independently goes into one of 
these urns with probability p, + Pp» it follows that X; + X; is a binomial 
random variable with parameters 10 and p, + Dj. 

By the same logic, X,; + Xz + X3 is a binomial random variable with 
parameters 10 and p, + p, + p,- Therefore, 


10 : ; 
P(X, +X,+X3 =7}= 7, } Pr +P. + Ps) yt Ps) 


4.25. Let X; equal 1 if person i has a match, and let it equal 0 
otherwise. Then 


n n n 


E[X]=E yy X;| = > E[X;] = > P{X; = 1} = » 1/n=1 


1 i=1 i=1 


where the final equality follows because person i is equally likely to end 


up with any of the n hats. 
To compute Var(X), we use Equation (9.1), which states that 


n 


- > ais 2 2, Foes 


Now, for i # j, 
E[X;X;] = P{X; =1,X; = 1} = P{xX; = 1} P(X; = 1]X; = 1} 
a oe! 
~ nmn—-1 
Hence, 
n 
2) 
a me Os 3 nis 1) 
i=1j#i 
= 1+n(n-1) : =? 
7 a n(n—1) 
which yields 


Var(X) =2-17=1 


4.26. With q = 1 — p, we have, on the one hand, 


\ P{X = 2i} 


1 


P(E) 


ee: aren Oe 
(i-g)i+q) 149 


On the other hand, 
P(E) = P(E|X = 1)p + P(E|X > 1)q = qP(E|X > 1) 


However, given that the first trial is not a success, the number of trials 
needed for a success is 1 plus the geometrically distributed number of 
additional trials required. Therefore, 

P(E|X > 1) = P(X + liseven) = P(E°) = 1— P(E) 
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which yields P(E) = q/(1+ 4). 
4.27. — In order for N = 6 one of the teams must be up 3 games to 2 
after the first 5 games and then must win game 6. This gives 


5 ; 
(3)e*a —p)*p+ (3) (1 — p)*p? (1 — p) 


P(N = 6) 


10(p*(1 — p)* + (1 —p)*p”) 
On the other hand, N = 7 if each team wins 3 of the first 6 games. Hence, 


P(N =7) = (S)e%a — p)* = 20p°(1 — p)* 


Hence, 
P(N = 6) — P(N =7) = p*(1—p)’ 


(10p? + 10 (1 — p)* — 20p (1 — p)) 


= p2(1 — p)’(40p? — 40p + 10) 


Calculus shows that 40p? — 40p + 10 is minimized when p = 1/2 with the 
minimizing value equal to 0. 

(b) In order for N = 6 one of the teams must be up 3 games to 2 after the 
first 5 games, and because when p = 1/2 each team is equally likely to 
win game 6, it is just as likely that N will equal 6 as that it will equal 7. 

(c) Imagine that the teams continue to play even after one of them has 
won the series. The team that wins the first game must win at least 3 of 
the next 6 games played to win the series. Hence, the desired answer is 


6 6 
» (Haras = 42/64, 
i=3 


4.28. 
a. The negative binomial represents the number of balls withdrawn in 
a similar experiment but with the exception that the withdrawn ball 
would be replaced before the next drawing. 
b. Using the hint, we note that X = r if the first 7 — 1 balls withdrawn 
contain exactly k — 1 white balls and the next withdrawn ball is 
white. Hence, 


PAS") = (ae aes 


k<r<mt+k 


4.29. 


1/8 
a.3 (F)earas@ray + (1/2)° + (3/4)°(1/4)") 
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1 4 5 4 
b. 3 1(2/3)G/3) + G/2)" + (1/4)"(3/4)] 


4.30. | Binomial with parameters n and 1 — p. 


4.31. 
k-1\/n+m-k 
oe) 
— [nt+m\ 
ey 
4.32. xX =iif the first i— 1 balls consist of r — 1 red and i—r blue 
balls, and the next ball is red. Hence, 


P(X=k = 


Let Y be the number of balls that have to be removed until a total of s blue 


balls have been removed. Then, V = min(X,Y) and fori<r+s, 
PV=i) = P(X=)+P(Y=) 


(a) nara 


n+tm n+m-—i+1 
i-1 


Now, Z = max(X,Y). Because Z >r+s,andZ=i>r+s either if X =i 
or if Y = i, we have, fori > r+, that 
P(Z=i) = P(X=i+P(Y =i) 


nena 


n+m n+m—-it+l 
i-1 


+ 


n+tm n+m—-it+l1 
i-1 
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X <Y ifthe r“ red ball is removed before a total of r + s balls have been 


removed. That is, 
PX <Y)=P(X<rts) 


see) n—-r+1 


n+m nema 
i= 1. 


i=r 


Chapter 5 


5.1. Let X be the number of minutes played. 

.P{X > 15}=1- P{X < 15} = 1—5(.025) =.875 
. P{20 < X < 35} = 10(.05) + 5(.025) = .625 

. P{X < 30} = 10(.025) + 10(.05) =.75 

. P{X > 36} = 4(.025) =.1 


20 0 8 


5.2. 
a.1=fiex"dx=c/(nt 1 >c=nt1 


1 
b. P{X >x}=(n+ 1) f.x"dx aytrtl = ies 
x 


5.3. First, let us find c by using 


. 
1 -| ean = 3520/5 S¢=5/3a2 
0 


ney Bo _ 5 64 
a. E[X] a5 0 as =55 2 = 318 
b. E[X2] = 2-2x8dx = 2-28 = 20/7 = var(X) = 20/7 - (5/3)? = 5/63 
5.4 Since 
1 
1 = [ cx+mrar=a/2 40/3 
0 
1 
6 = [ cost + mete = 0/3. 0/4 
0 
we obtain a = 3.6,b = — 2.4. Hence, 


a. P{X < 1/2} = fy’ (3.6x — 2.4x)dx = (1.8x? — .8x3)|*/? = 35 
b. E[X?] = J, (3.6x3 — 2.4x*) dx = 42 = Var(X) = .06 
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5.5. Fori=1,...,n, 
P{xX =i} = Pf{Int(nv) =i-1} 


= Pfi-l1<nU<i 


t=1 i 
- et <u<s 
n n 


= 1i/n 


5.6. If you bid x,70 < x < 140, then you will either win the bid and make a 
profit of x — 100 with probability (140 — x)/70 or lose the bid and make a 
profit of 0 otherwise. Therefore, your expected profit if you bid x is 


1 1 
7p (& ~ 100)(140 — x) = = (240x — x? — 14000) 


Differentiating and setting the preceding equal to 0 gives 
240 —2x =0 


Therefore, you should bid $120,000. Your expected profit will be 40/7 
thousand dollars. 
5.7. 
a. P{U > .1} = 9/10 
b. P{U > .2|U > .1} = P{U > .2}/P{U > .1} = 8/9 
c. P{U > .3|U > .2,U > 1} = P{U > .3}/P{U > .2}=7/8 
d. P{U > .3} = 7/10 
The answer to part (d) could also have been obtained by multiplying 
the probabilities in parts (a), (b), and (c). 


5.8. Let X be the test score, and let Z = (X — 100)/15. Note that Z is a 


standard normal random variable. 
a. P{X > 125} = P{Z > 25/15} = .0478 
P{90<X<110} = P{-10/15<Z< 10/15} 


= P{Z<2/3}—P{Z< —2/3} 
= P{Z<2/3}—[1—P{Z < 2/3}] 
= .4950 


5.9. Let X be the travel time. We want to find x such that 
P{X > x} = .05 


which is equivalent to 


That is, we need to find x such that 


x — 40 
Plz > 5 | =.05 


where Z is a standard normal random variable. But 
P{Z > 1.645} = .05 


Thus, 
x — 40 


= 1.645 or x=51.515 


Therefore, you should leave no later than 8.485 minutes after 12 P.M. 
5.10. Let X be the tire life in units of one thousand, and let Z = (X — 34)/4. 
Note that Z is a standard normal random variable. 
a. P{X > 40} = P{Z > 1.5} = .0668 
b. P{30 << X < 35}=P{-1<Z<.25}=P{Z <.25}—P{Z>1} = .44 
P{X > 40|X > 30} = P{X > 40}/P{xX > 30} 


C. 
= P{Z>1.5}/P{Z> —1} = .079 


5.11. Let X be next year’s rainfall and let Z = (X — 40.2) /8.4. 
a. P{X > 44} = P{Z > 3.8/8.4} = P{Z > 4524} = 3255 


b. (7) (.3255)3(.6745)* 


5.12. Let M; and W; denote, respectively, the numbers of men and women 
in the samples that earn, in units of $1,000, at least i per year. Also, let Z be a 
standard normal random variable. 
P{W. = 70} 
a.  # Ws — 200(.34) _ 69.5 ~ 200(.34) 
,/200(.34)(.66)  ,/200(.34)(.66) 
= P{Z > .2239} 


~ 4114 
P{M,, > 120} 


b. — p\_ Mas — (200)(.587) _ 120.5 ~ (200)(.587) 
~~ (/(200)(.587)(.413) _ /(200)(.587)(.413) 


= P{Z>.4452} 
= .6719 
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P{M5, > 150} 
= P{M5, > 149.5} 
_ p{ Mao ~ (200)(.745) | 149.5 ~ (200)(.745) 
~~ |./(200)C745)(.255)  ./(200)(.745)(.255) 
~ P{Z > .0811} 
— § 4677 
"P{W = 100} 
= P{W > 99.5} 


_ a) Woo - (200) (.534) = 99.5 — (200)(.534) 
7 ,/ (200)(.534)(.466) —_,/ (200) (.534)(.466) 
= P{Z > — 1.0348} 
= 8496 

Hence, 


P{M 9 > 150}P{Wo9 > 100} = 3974 


5.13. | The lack of memory property of the exponential gives the result 
eH. 


c. A(t) = 2te*’ /e-® = 2t 
d. Let Z be a standard normal random variable. Use the identity 
E[X] = J, P{X > x}dx to obtain 


[ ore 

0 

= vn e¥ /2dy 
0 


2-1/2\/2nP{Z > 0} 
Va /2 


E[X] 


e. Use the result of Theoretical Exercise 5.5 — to obtain 


E[X?] -| 2xe-*'dx= —e-*'| =1 
0 


0 


Hence, Var(X) = 1-7/4. 
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5.15. 
6 _ 
a. P(X > 6} = exp{-pa(e)dt} = e345 
P{X <8|X>6} = 1-P{xX>8|X>6} 


1 — P{X > 8}/P{X > 6} 
1- ee ae 


8892 


R 


5.16. Forx > 0, 
Fy /x(x) 


P{1/X < x} 


= P{X <0} + P{X = 1/x} 
1/2+1=Fy(47/%) 


Differentiation yields 
ig = =x 77079) 
1 
x2n(1 + (1/x)”) 


= fy 


The proof when x < 0 is similar. 
5.17. If X denotes the number of the first n bets that you win, then the 


amount that you will be winning after n bets is 
35X — (n—X) = 36X—-n 


Thus, we want to determine 
a = P{36X —n > 0} = P{X > n/36} 


when X is a binomial random variable with parameters n and p = 1/38. 
a. When n = 34, 


a = P{x=>1} 
= P{X>.5} (the continuity correction) 
_ pi _*%= 34/385 34/38 
~ “ |./34(1/38)(7/38)  ./34(1/38)G7/38) 
X — 34/38 
= PP). > — 4229 
J34(1/38)(37/38) 
~ ® (4229) 
= .6638 


(Because you will be ahead after 34 bets if you win at least 1 bet, the 
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exact probability in this case is 1 — (37/38)** = .5961.) 
b. When n = 1000, 
a = P{X > 27.5} 


_ p{__*= 1000/38 | _27.5 - 1000/38 
/1000(1/38)(37/38) © ./1000(1/38) (37/38) 
=x 1— © (.2339) 


4075 


R 


The exact probability—namely, the probability that a binomial n = 1000, 
p = 1/38 random variable is greater than 27—is .3961. 
c. When n = 100,000, 


a = P{X > 2777.5} 
_ p|__X=100000/38 | _ 2777.5 ~ 100000/38 
./100000(1/38)(37/38)  ,/100000(1/38) (37/38) 
=x 1— © (2.883) 


.0020 


R 


The exact probability in this case is .0021. 


5.18. If X denotes the lifetime of the battery, then the desired probability, 
P{X >s+t|X >t}, can be determined as follows: 
P{xX >s+t,X >t} 


P{X >s+t|X >t} PIX >t 


P{x >s+t} 
P{X > t} 


P{X > s + t|battery is type 1}p, 

+ P{X > s + t|battery is type 2}p, 
P(X > t|batteryistype tp, 
+ P{X > t|battery is type 2}p, 

est Opn 4 eA2S+ Dp, 


e Ap, + e tap, 
Another approach is to directly condition on the type of battery and then use 


the lack-of-memory property of exponential random variables. That is, we 
could do the following: 
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P{X >s+t|X >t} 
= P{X >s4+t|X >t, type 1}P{type1|X > t} 
+ P{X >s+t|X > t,type2}P{type 2|X > t} 
=e “15P{type1|X > t} + e725P{type 2|X > t} 


Now for i = 1,2, use 
P{typei,X >t} 
P{X > t} 


P{X > t|type i}p, 
P{X > t|type 1}p, + P{X > t|type 2}p, 
eit 


P{type i|X > t} 


pee a ee 
et: + e4aty, 


5.19. | Let X; be an exponential random variable with mean i, i = 1, 2. 
a. The value c should be such that P{X, > c} = .05. Therefore, 
e-€=.05 = 1/20 


orc = log(20) = 2.996. 
1 
b. P{X, >ch=e ¢/4 = —— = .2236 
{ 2 } 20 


20 
5.20. 
+ 1 + —x?/2 
E[(Z—-c)'] = aa (x—c)'e dx 
1 2 
= x —c)e~* /*dx 
ral ¢ 
Cc 
a. [oe) [oe) 
1 1 
= am | xe~*'/2dx = al ce~*'/2dx 
Cc Cc 
1 —x?/2 ° 
= oe =c(1— @ (c)) 
Cc 
1 —c?/2 
= Jon” —c(1- ® (c)) 


b. Using the fact that X has the same distribution as u + oZ, where Z is a 
standard normal random variable, yields 
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E[(X—c)"] = El(utoZ—c)"] 


= aloe) 


c= yp 
where a = ——. 
O 


5.21. Only (b) is true. 


a. lfb > 0, then for0 <x <b, 
P(bU <x) = P{U <x/b}=x/b. 


Hence, 
fry) =1/b0<x <b 


The argument when b < 0 is similar. 
b. Fora<x<1+a, 
P{a+U<x}=P{U<x-a}=x-a 


Differentiation yields 
fasyO) =lLa<x<1t+a 


c.a+(b—a)U 
d.For0<x<1/2, 
P{min(U, 1 — U) < x} 


PU < x}u{U > 1—-~x}) 
P{U < x}+ P{U >1—-<x}= 2x 


Differentiating gives 
f min u,1-u)) =2, 0<x<1/2 


e. Using that max (U,1 — U) = 1 — min (U,1 — U), the result follows from 
(a), (b), and (d). A direct argument is that, for 1/2 <x <1, 
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P{max (U, 1 — U) < x} 1 — P{max (U,1 — U) > x} 


1— P({U > x}u {U <1 —--x}) 
1-—(1-x)-(1-x) =2x-1 


Hence, 
f max (ua—uy(*) = 2, 1/2<x<1 
D.20. 
br: EX] = 1/2 
5.24. 
a. J". +x)0e- dx = 1 + 4) = 1. 
reals 1+6 
b. With Y being exponential with rate 6, 
l= Faget = 5446 
EX] = —"—(e(v] + EY) = — (2 Hen 
: T+0° [y ‘] [ | ~ 14092 2 ence, 
Var(X) = 6 2 nH 6 20. 
Chapter 6 
6.1. 


6.2. 


.30€+6C=1>5C=1/9 
. Let p@i,j) = P{X =i, Y = j}. Then 


p(1,1) = 4/9, p(1,0) = 2/9, P(0, 1) = 1/9, p(0,0) = 2/9 


>) ()eratara 


i=8 


. With p, = P{XYZ = j}, we have 


De =P. =P, =P, =1/4 


Hence, 


E[XYZ] = (6+2+4+412)/4=6 


b. With q; = P{XY + XZ + YZ = j}, we have 
Vit = 45 = Ig = U6 = 1/4 


Hence, 
E[XY + XZ+ YZ] = (11+5+8+4 16)/4= 10 


6.3. In this solution, we will make use of the identity 


[oe) 
| e *x"dx =n! 
0 


which follows because e *x”/n!,x > 0, is the density function of a 
gamma random variable with parameters n + 1 and 4 and must thus 
integrate to 1. 


co y 
c| | (y — x) dxdy 
0 -y 


c| e %2y* dy =4C 
0 


any 
lI 


Hence, C = 1/4. 
b. Since the joint density is nonzero only when y > x and y > — x, we 
have, for x > 0, 


1 [oe) 
fx) = | (yqxje-ody 


1 
SS —(x+u)q 
| ue Uu 
0 


_ 1 
= Ao 


| (y—x)e "dy 


[=ve "=e "xe 7] 


—x 


For x <0, 


BIR 


fy 


(oe) 
4 


1 
4 
( — 2xe* + e*)/4 


== 9? = discs Be ty 
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[o) 0 
1 
E[X] = | xe *dx + [ ( — 2x2e* + xe*) dx 


d. st 
= i) -[ be ye Way 
= + 4—-1)=-1 
a ral |= 


1 .w 
e. E[Y] =5) 0 yee *dy=3 


6.4. The multinomial random variables X;, i = 1,...,7, represent the 
numbers of each of the types of outcomes 1, ...,7 that occur in n 
independent trials when each trial results in one of the outcomes1., ..., 7 
with respective probabilities Pirin Now, say that a trial results ina 
category 1 outcome if that trial resulted in any of the outcome types 

1, ....7,; say that a trial results in a category 2 outcome if that trial resulted 
in any of the outcome types r, + 1,...,7; +72; and so on. With these 
definitions, Y;,...,¥, represent the numbers of category 1 outcomes, 
category 2 outcomes, up to category k outcomes when n independent 
trials that each result in one of the categories 1, ...,k with respective 


ri- TT; 
probabilities ». : p,; i= 1,..,k, are performed. But by definition, 
j=rj-1+1 ° 
such a vector has a multinomial distribution. 
6.5. 
a. Letting p; = P{XYZ = j}, we have 


p,=1/8, p,=3/8, p,=3/8, pg=1/8 


b. Letting p, = P{XY + XZ+YZ = j}, we have 
=1/6. p.=378, 9.23/59, p= 178 


c. Letting p; = P{X’ + YZ = j}, we have 
,= 1/8, p,=1/4, pp=1/4, pe=1/4, pg =1/8 


1. -8 
{| (x/5 + cy) dy dx 
0 41 


| (4x/5 + 12c) dx 


0 


6.6. 


any 
Il 


12¢+2/5 


Hence, c = 1/20. 
b. No, the density does not factor. 


1 75 
{| (x/5+ y/20)dydx 
0 “3-x 
1 


| [(2 + x)x/5 + 25/40 — (3 — x)*/40] dx 


0 


P{X+Y > 3} 


1/5+1/15+5/8—19/120 = 11/15 


6.7. 
a. Yes, the joint density function factors. 
b. f(x) = xfoydy =2x, 0<x<1 
1 
c. fy0) = yJ »xdx =y/2, 0<y<2 
P{X<x,Y <y} P{X < x}P{Y < y} 


min(1,x?)min(1, y7/4), x >0,y >0 
2 
e. E[Y] = Jyy?/2dy = 4/3 


1 1-x 
P{X+Y<1} = [=| y dy dx 
0 Yo 
1 


1 2 
ae x(1 —x)°dx = 1/24 


0 


6.8. Let T; denote the time at which a shock type i, of i = 1,2,3, occurs. 
Fors > 0,t > 0, 
PiXe > 5Xa > T} 


PIT, > sits > t,.73 > max(s,2)} 

= P{T, > s}P{T, > t}P{T, > max(s,t)} 

= exp{—A,s}exp{—A,t}exp{—A, max(s, t)} 
= exp{—(A,s + A,t + A; max(s, t))} 


6.9. 
a. No, advertisements on pages having many ads are less likely to be 
chosen than are ones on pages with few ads. 


LO ' 
c, =1___ = n/n, wheren = >. n(i)/m 
i=1 


iain) 
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\ 1 1—-a/n)t-t = 1 
e. aa n/n) ra 


k=1 
f. The number of iterations is geometric with mean nvn 


6.10. 
a.P{X=i}=1/m, i=1,...,m. 
b. Step 2. Generate a uniform (0, 1) random variable U. If 
U < n(X)/n, go to step 3. Otherwise return to step 1. 
Step 3. Generate a uniform (0, 1) random variable U, and select 
the element on page X in position [n(X)U] + 1 


6.11. Yes, they are independent. This can be easily seen by 
considering the equivalent question of whether Xy is independent of N. 
But this is indeed so, since knowing when the first random variable 
greater than c occurs does not affect the probability distribution of its 
value, which is the uniform distribution on (c, 1). 
6.12. Let p, denote the probability of obtaining i points on a single 
throw of the dart. Then 
Pz, = 1/36 
Po, = 40/36-—p,, =1/12 
Pip = 9%/36— py. — Pz) = 92/36 
Py = 1- P49 ~ Poo ~ P39 = 1-7/4 
a.m/12 
b. 1/9 
c.1—7/4 
d. 7(30/36 + 20/12 + 50/36) = 3577/9 


e. (1 /4)" 
f. 2(1/36)(1 — 2/4) + 2(1/12)(52/36) 


6.13. Let Z be a standard normal random variable. 


4 
: >, %-6 


= —6 
P >, Xi >0 = P,+=4+____ > __ 
4g ‘ V24 V24 


R 


P{Z > — 1.2247} ~ 8897 
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P{X,+X,>5} 


V 
i) 
II M4» 
os 
II 
ol 


L 1 L 1 
b. a Pee 
3 47 
=. py SS Sa 
ae en} 
~ P{Z >.5774} ~ .2818 
4 
P >, %> 01k, =5 = P{X,+X,;+X,>—-5} 
i=1 
. YAK AY, HAs 
2 3 4 ‘F 
= ple ee Ss sie 
[e is 


P{Z > — 2.239} = .9874 


R 


6.14. — In the following, C does not depend on n. 
P{N = n|X = x} f xj n)PIN = n}/ fy (x) 


(Ax)""*(1 — p)”* 


c 


(n—-1)! 
C(A(1 — p)x)” */(n- 1D! 


which shows that, conditional on X = x, N — 1 is a Poisson random 
variable with mean A(1 — p)x. That is, 
P{N =n|X = x} P{N-1=n-1|X =x} 
= eAC-P)eQ(1 —p)x)” */M—1)1,n = 1. 


6.15. 
a. The Jacobian of the transformation is 
#2 10)_, 
a Fee na 


As the equations u = x,v = x + y imply that x = u,y =v—u, we 


obtain 
fyuy@v) =fyy@v—-u=1, O<u<l1, O<v—-u<il 


or, equivalently, 
fyy@v) =1, max(v — 1,0) < u < min(v,1) 


fy) -| du=v 
0 


b. For0<v<1, 
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For1<v<2, 


1 
fy = | du=2-v 
v-1 


6.16. | Let U be a uniform random variable on (7, 11). If you bid 
x,7 <x < 10, you will be the high bidder with probability 


eu <soy=(0f2G? 22H) 2(7) 


Hence, your expected gain—call it E[G(x)|—f you bid x is 


1 
E[G(x)] = = &— 7)°(10 - x) 


Calculus shows this is maximized when x = 37/4. 
6.17. Let i,,iz,... ,i,, be a permutation of 1,2, ... ,.n. Then 


Pik = tyke Hy cy ee SE 
= P{X, = i }P{X2 = i2}--P{Xn = in} 
Pala Pig 
PyP2°Py 


n! 
Therefore, the desired probability is n! p,p,---p,,, which reduces to a 
when all p, = 1/n. 

6.18. 


n n 
a. Because > Xi = > Y,, it follows that N = 2M. 
— — 
b. Consider the n — k coordinates whose Y-values are equal to 0, and 
call them the red coordinates. Because the k coordinates whose X 


n 
-values are equal to 1 are equally likely to be any of the (") sets 


of k coordinates, it follows that the number of red coordinates 
among these k coordinates has the same distribution as the 
number of red balls chosen when one randomly chooses k of a set 
of n balls of which n — k are red. Therefore, M is a hypergeometric 


random variable. 
2k(n — k) 
c. E[N] = E[2M] = 2E[M] = ————~ 
d. Using the formula for the variance of a hypergeometric given in 
Example 8j of Chapter 4 ___, we obtain 


Var(N) = 4Var(M) = ane —k/n)(k/n) 
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6.19. 


n 
a. First note that S, — S, = > Z; iS anormal random variable 
i=k+1 
with mean 0 and variance n — k that is independent of S;,. 
Consequently, given that S, = y, S, is anormal random variable 
with mean y and variance n — k. 

b. Because the conditional density function of S; given that S,, = x is 
a density function whose argument is y, anything that does not 
depend on y can be regarded as a constant. (For instance, x is 
regarded as a fixed constant.) In the following, the quantities 
C;,i = 1,2,3,4 are all constants that do not depend on y: 


Cg 
y|x a 
Si ln f.@) 
Chg |g, EDF, 0) [wi = 
= 1 x\|V W. ere 1 = — _ 
Sn |S Sk fs 0 
— Cy 7 /21- i ery /2k 
vV2mIvn—k V2nvk 
= ie Oo 
2°*PI"2(n—k) 2k 
2xy y? | 
= C3 exp ———= = 


But we recognize the preceding as the density function of a normal 


; ; k ; k(n —k) 
random variable with mean 7X and variance ————. 


6.20. 
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6.21. 


P{X, > X,|X, = max(X,,.. .,Xs)} 


P{X, > X1, X, = max(X;,.. .,X5)} 
~__P{X, = max(X,,...,Xs)} 
a. P{X_ = max(X,.. .,X6), X1 = max(X;,.. .,Xs)} 
= 1/5 
11 1 
"65 6 


Thus, the probability that X, is the largest value is independent of 
which is the largest of the other five values. (Of course, this would 
not be true if the X; had different distributions.) 
b. One way to solve this problem is to condition on whether X, > X,. 
Now, 
PIR Xs | Ky See es A 1 


Also, by symmetry, 
1 
P{X, > X2|X, = max(X,,. . /X5),X6 < Xi} = 5 


From part (a), 
1 
P{X,_ > X,|X, = max(Xj,. . iXs\y=e 


Thus, conditioning on whether X, > X, yields the result 


1 
PIX, > Xa by = max(Xj,. . ia l= ers e= 75 


P{X > s,Y > t} 
1— P({X < s}u{Y < t}) 
1—-P{X<s}-P{Y <t}+P{xX<s,Y<t} 


6.22. | Suppose j < i, and consider P(X, = i,Y, = j). If there have been 
s failures after trial j then there have been j — s successes by that point. 
Hence, the conditional distribution of X,., given that Y; = j, is the 
distribution of j plus the number of additional trials after trial j until there 
have been an additional r — j + s successes. Hence, for j < i 


P(X, =i, Y= jf) = PVs = j)P(Xr = ilYs = f) 
= P(Y,= DP Xstr-j =t=]) 


“(anion 


t—jod str-j i-s-r F . 
1- , j< 
(oa) (1 —p) j<i 
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PA >x) atx? 
P(X>X 9) adxo* 


| fyv@b)fyo)dy = | LOY) & oy) ay 


6:23,. Forx>x%, PUK > aX > x5) = 


_ fy) 
6.24. a 
= | f(x, y) dy 
= fy) 
6.25. 


a. pt es I] = ») 


b. Condition on the number of times i would advance if i played 


(oe) 


forever, to obtain ». pr(1- pal | (1—pF**). 
k=0 j#i 


C: yk = pd] | a =p): 


3 
3 


~ 
N 
R 
| 
any 
ot 
II 
aS 
oN 


Il 
N 
vU 
o 
nt 
| 
ay 
Y 
| 
any 


giving that 


3 
fan 
+ 

i 3 
AA 
D> 
NS 
R 

| 

rary 
—— 


P(Sis even) = P( Y,=1)= 


For0<x<1 
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P(N =n|X = x)f (x) 


f xjy (ln) = P(N =n) 


(" ena = x)™x2-41 — x)? 7 
B(a,b)P(N =n) 


— Katte ay 


n+m 
3") 
B(a, b)P(N =n) 


conclude that the conditional density of X given that N = n is beta with 
parameters n+ a,m-+ b. As a byproduct, we also see that 


n+m 
“re 
B(a,b)P(N=n) B(at+n,b+m)’ 


ntm 
( ean, b+) 


where K = does not depend on x. Hence, we can 


or equivalently that 


P(N =n) =+— ep 


Chapter 7 


tats 


a.d= S 1/n(i) 


b. P(X = i} = P{[mU] =i-1} = P{fi-1<mU <i}=1/m, i=1,... 


m — m ; ~ m 1 
o. 5 | = >. Ti ama » nom 


i=1 i=1 


7.2. Let J; equal 1 if the jth ball withdrawn is white and the (j + 1) is 
black, and let J; equal 0 otherwise. If X is the number of instances in 
which a white ball is immediately followed by a black one, then we may 
express X as 


n+m 1 
j= 


Thus, 
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E[X] 


n 


n 


n 


n+m-1 
= >. E{I)] 
p= 


+m—-1 
py selection is white, (j + 1) is black} 

j=1 
rm 1 

». P{j ‘N selection is white }P{(j + 1) is black Vis is white} 
j=1 
+m—-1 

n m 
>. n+mnt+m-—1 


j=1 


The preceding used the fact that each of the n + m balls is equally 
likely to be the jth one selected and, given that that selection is a white 


ball, each of the other n + m — 1 balls is equally likely to be the next 
ball chosen. 


7.3. 


Arbitrarily number the couples, and then let J; equal 1 if married 


couple number j, j = 1,...,10, is seated at the same table. Then, if X 


represents the number of married couples that are seated at the same 
table, we have 


so 


a. 


10 
j=l 


10 
E[X] = » E(I)] 
j=1 


To compute E[J;] in this case, consider wife number j. Since 
19 
each of the ( : groups of size 3 not including her is equally 


likely to be the remaining members of her table, it follows that 
the probability that her husband is at her table is 


o 


Hence, E[/;] = 3/19 and so 


E[X] = 30/19 


b. In this case, since the 2 men at the table of wife j are equally 
likely to be any of the 10 men, it follows that the probability that 


one of them is her husband is 2/10, so 
E[I;j]=2/10 and E[X] =2 


7.4... From Example 2i _, we know that the expected number of 
times that the die need be rolled until all sides have appeared at least 
once is 6(1+ 1/2 +1/3+1/4+1/5+1/6) = 14.7. Now, if we let X; 


denote the total number of times that side i appears, then, since 
6 


». X; is equal to the total number of rolls, we have 


6 6 
14.7=E ». X;| = ». E[Xi] 
i=1 i=1 


But, by symmetry, E[X;] will be the same for all i, and thus it follows 
from the preceding that E[X,] = 14.7/6 = 2.45. 

7.5. Let J; equal 1 if we win 1 when the jth red card to show is 
turned over, and let J; equal 0 otherwise. (For instance, /, will equal 1 if 
the first card turned over is red.) Hence, if X is our total winnings, then 


E[X]=E y is . E{I)] 
j=l j =i 


Now, J; will equal 1 if j red cards appear before j black cards. By 
symmetry, the probability of this event is equal to 1/2; therefore, 

E[I;] = 1/2 and E[X] = n/2. 

7.6. Tosee that VN <n-—1-+], note that if all events occur, then both 


L 1 


sides of the preceding inequality are equal to n, whereas if they do not 
all occur, then the inequality reduces to N < n — 1, which is clearly true 


in this case. Taking expectations yields 
E[N]) <n-1+E{I] 


However, if we let J; equal 1 if A; occurs and 0 otherwise, then 


E[N|=E S = Sad Sta) 


i=1 i=1 


Since E[I] = P(A,:--A,,), the result follows. 
7.7. Imagine that the values 1,2, ... ,n are lined up in their numerical 
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order and that the k values selected are considered special. From 


Example 3e __, the position of the first special value, equal to the 
n-k n+1 

smallest value chosen, has mean 1 + —— = ——. 
k+1 k+1 

For a more formal argument, note that X = 7 if none of the j — 1 


smallest values are chosen. Hence, 
("7") (7-1) 
k j-1 
{X > j} 7 
j-1 


Cr) 


which shows that X has the same distribution as the random variable of 


Example 3e —_ (with the notational change that the total number of 
balls is now n and the number of special balls is k). 

7.8. Let X denote the number of families that depart after the 
Sanchez family leaves. Arbitrarily number all the N — 1 non-Sanchez 
families, and let /,, 1 <r < N — 1, equal 1 if family r departs after the 


N—1 
x= > 
r=1 


Sanchez family does. Then 


Taking expectations gives 
N-1 
E[xX] = ». P{family r departs after the Sanchez family} 


r= 1 


Now consider any non-Sanchez family that checked in k pieces of 
luggage. Because each of the k + j pieces of luggage checked in either 
by this family or by the Sanchez family is equally likely to be the last of 
these k + j to appear, the probability that this family departs after the 


k 
Sanchez family is eay Because the number of non-Sanchez families 


who checked in k pieces of luggage is n, when k # j, orn; — 1 when 
k = j, we obtain 


kng 1 
EA = k+j 2 
k 


7.9. Let the neighborhood of any point on the rim be the arc starting 
at that point and extending for a length 1. Consider a uniformly chosen 
point on the rim of the circle—that is, the probability that this point lies 


x 
on a specified arc of length x is a and let X denote the number of 
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points that lie in its neighborhood. With /; defined to equal 1 if item 
number j is in the neighborhood of the random point and to equal 0 


otherwise, we have 
19 
j=1 


Taking expectations gives 
19 


E[X] = » P{item j lies in the neighborhood of the 
j=t 


random point} 


But because item j will lie in its neighborhood if the random point is 
located on the arc of length 1 going from item j in the counterclockwise 


direction, it follows that 
1 
P{item j lies in the neighborhood of the random point} = = 


Hence, 


Because E[X] > 3, at least one of the possible values of X must exceed 
3, proving the result. 
7.10. If g(x) = x1/2, then 


1/2 -3/2 


I =! = " = 1 
g(x) =5x-N7, g"@)= Gx 


so the Taylor series expansion of Vx about A gives 


1 io 
VX =VA+54 a? as Veet. 3/2¢x — Ay? 


Taking expectations yields 


i — 
E[VX] * VA+5a TERA A VX Ay’ | 
1 —3/2 
= VA— 3A a 
a 
8 


Hence, 


Var(VX) 


I 
aa 
has 
| 
Ea 
a 
5 
A 


= 1/4 


7.11. | Number the tables so that tables 1, 2, and 3 are the ones with 
four seats and tables 4, 5, 6, and 7 are the ones with two seats. Also, 
number the women, and let Xij equal 1 if woman i is seated with her 
husband at table j. Note that 


3, 
E[X;,;] ao, Oe? = 12,3 
(") 
and 
1 ; | 
EIXis] = 75y = Too" j=4,5,6,7 
(2) 


Now, X denotes the number of married couples that are seated at the 
same table, we have 


E[X] 


M iw 
i] 
i] 
i] 


7.12. — Let X; equal 1 if individual i does not recruit anyone, and let X; 
equal 0 otherwise. Then 


E[X;]| = Pf{idoesnotrecruit any ofi+1,i+2,...,n} 
i-1 i n—2 
. “2 il n=1 
i-1 
~ n-1 
Hence, 
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From the preceding, we also obtain 


= 7 +) - ne 


i 
Var(X;) = w= 1 


1 n—-1 (n—1)? 
Now, for i < j, 
_ i-1 j-2j-2j-1 n-3 
ee. = a ge ad 
G—1)G—2) 
~~ (n—2)(n—1) 
Thus, 
G@-DU-2 i-1j-1 
COU) (n—2)(n—1) n—-1n-1 
_ @-)G-») 
(a= 2)Ge=1)" 
Therefore, 
n n—i1 
wo( J. x Var(X;) + 2 > y Cov(X;, Xj) 
i=1 i=1j=it1 
n ame n 
_ (i-—1)™— i) @-)G-n) 
fF. (n—1)? OS 2 _@-2)a-1° 
i oa 
eae pa Gi-1)m-i 


n—i1 


1 
——— (i-—1)(n-—iD(n-i-1) 
(n—2)(n— 1) a 


7.13. Let X; equal 1 if the ith triple consists of one of each type of 


QOW_: 


E[X;] = 5 7 
(3) 


Hence, for part (a), we obtain 


player. Then 


X;| = 6/7 
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It follows from the preceding that 
Var(X;) = (2/7)(1 — 2/7) = 10/49 


Also, for i # j, 
E[X;X;] = P{X;=1,X; =1} 


P{X; = 1}P{X; =1|X; =1} 


() 


6/70 


Hence, for part (b), we obtain 


= 
M 
os 
| 
iM. 


Var(X;) +2) >» Cov(X;,X;) 
1 


i=1 i j>1 
— 30/49+2(° 2 
oe 2)\70 49 
312 
~ 490 


7.14. Let X;,i=1,...,13, equal 1 if the ith card is an ace and let xX; 
be 0 otherwise. Let Y; equal 1 if the jth card is a spade and let Y; = 0 
otherwise. Now, 


13 13 
Cov(X,Y) = Cov >, Xi, > Y, 
i=1 j=1 
13 13 
= ». » Cov(X;, Y;) 
i=1j=1 


However, X; is clearly independent of Y; because knowing the suit of a 
particular card gives no information about whether it is an ace and thus 
cannot affect the probability that another specified card is an ace. More 
formally, let A;;, Ain, Aia Aic be the events, respectively, that card iis a 
spade, a heart, a diamond, and a club. Then 


1 
PUY; = 1} = GPU; = 1 Ais} + PLY; = 1 Ain} 


+P{Y; =1|Aja}+ P{Y; =1|Ajc}) 


But, by symmetry, we have 
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PLY; 


1| Ais} = P{Y; =1|Ain} = P{Y; =1]Aia} 
P{Y; =1|Aie} 


Therefore, 
P{y; = 1} = P{y; = 1] Ais} 


As the preceding implies that 
Pty; = 1} = P{¥; = 1A} 


we see that Y; and X; are independent. Hence, Cov(X;, Y;) = 0, and 
thus Cov(X, Y) = 0. 
The random variables X and Y, although uncorrelated, are not 
independent. This follows, for instance, from the fact that 

P{Y = 13|X = 4} =0 # P{y = 13} 


TAS. 
a. Your expected gain without any information is 0. 
b. You should predict heads if p > 1/2 and tails otherwise. 
c. Conditioning on V, the value of the coin, gives 


1 
E[Gain] = [ steam = pip 
0 


1/2 1 
| [101 — p) —1(p)] dp + | [1(p) —1(1 — p)] dp 
0 1/2 
= ij? 


7.16. | Given that the name chosen appears in n(X) different 
positions on the list, since each of these positions is equally likely to be 
the one chosen, it follows that 

E[I|n(X)] = P{l = 1|n(X)} = 1/n(x) 


Hence, 
E[t] = E[1/n(X)] 


Thus, E[mI] = E[m/n(x)] = d. 

7.17. — Letting X; equal 1 if a collision occurs when the ith item is 
placed, and letting it equal 0 otherwise, we can express the total 
number of collisions X as 


Therefore, 


To determine E[X;], condition on the cell in which it is placed. 


E[X;] = >. ELXi| placed in cell jlp, 
j 


— > P{i causes collision |placed in cell Dp; 
j 


= > 2-a-p)'P, 
i 

= 1- 1—p.)'p. 

a Pi) DP; 


The next to last equality used the fact that, conditional on item i being 
placed in cell j, item i will cause a collision if any of the preceding i — 1 
items were put in cell j. Thus, 


E[X] = m— S y pyre, 


i. ee 
Interchanging the order of the summations gives 


E[X]=m-n+ ». =p" 
j=1 


Looking at the result shows that we could have derived it more easily 


by taking expectations of both sides of the identity 
number of nonempty cells = m — X 


The expected number of nonempty cells is then found by defining an 
indicator variable for each cell, equal to 1 if that cell is nonempty and to 
0 otherwise, and then taking the expectation of the sum of these 
indicator variables. 

7.18. Let L denote the length of the initial run. Conditioning on the 
first value gives 
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n 
E[L] = E|L|first value is one| 
n+m 


m 


+E|L |first value is zero] 
n+m 


Now, if the first value is one, then the length of the run will be the 
position of the first zero when considering the remaining n + m — 1 
values, of which n — 1 are ones and m are zeroes. (For instance, if the 
initial value of the remaining n + m — 1 is zero, then L = 1.) As a similar 
result is true given that the first value is a zero, we obtain from the 


preceding, upon using the result from Example 3e __, that 
nt+tm n n+m m 


E{L|) = 

[4] m+int+m nt+in+m 
_ n m 
~~ m+1 n4+i1 


7.19. | Let X be the number of flips needed for both boxes to become 
empty, and let Y denote the number of heads in the first n + m flips. 
Then 


E[X] E[X|Y = iJP{y =i 


ELXIY = a(” " ") ee aaa 


Now, if the number of heads in the first n + m flips is i, i < n, then the 
number of additional flips is the number of flips needed to obtain an 
additional n — i heads. Similarly, if the number of heads in the first 

n+ flips is i, i > n, then, because there would have been a total of 
n+m-—i<mtails, the number of additional flips is the number needed 
to obtain an additional i — n heads. Since the number of flips needed 
for j outcomes of a particular type is a negative binomial random 
variable whose mean is j divided by the probability of that outcome, we 
obtain 


i-n/(n+m\ , 
+ ze ntm-ti 
| > = ; Joi P) 


7.20. Taking expectations of both sides of the identity given in the 


hint yields 


E[x"| = E> 1x) ax 
0 
= >| Ein” iy ()| dx 
0 


= >| x”~1E[Iy(x)] dx 
0 


7 >| eFax 
0 


Taking the expectation inside the integral sign is justified because all 
the random variables I(x), 0 < x < ©, are nonnegative. 

7.21. | Consider a random permutation /,,...,1,, that is equally likely 
to be any of the n! permutations. Then 


Ela,,a,,,,] = YF lees 
k 


_ 1 
= — > alas, 
k 
1 
= > ae) alls = ill; =k} 
k i 
oy) 
~ n(n—-1) AK 4 
k 


it k 


1; =k 


1 
a n(n—1) = >, ae( — ax) 
k 
0 


< 


n 
where the final equality followed from the assumption that > a; = 0. 
i= 


Since the preceding shows that 


it follows that there must be some permutation i,,...,i, for which 


n—i1 


> are <0 


j= 
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7.22. 
a. E[X] =A, +A, E[Y] =A, +A, 
Cov(X,Y) = Cov(X,+X>, X_ + Xs) 


= Cov(X,, X> + X3) + Cov(X2, Xo +X3) 


b. = Cov(X2, X2) 
= Var(X2) 
= dy 
c. Conditioning on Xz gives 
PIX=LY = jt 


= > Pex =27 =i, =P = 2 
k 
= > Pi = i-k,X,=j—k|X, = kje742a*/k! 


i—k,X3 =j—kje~72a5/k! 


] 
M 
Fa) 
a 
] 


- > Pi = i- k}P{X3 = j — kje*2Az/k! 


7.23. NOyNGy 


>. Cov(X;, Y;) + > ». Cov(X;, Y;) 
= i i j#i 


NOxOy 
_ NPOx dy 
NOx Oy 
=p 
where the next to last equality used the fact that Cov(X;,Y;) = pox,oy 
7.24. Let X; equal 1 if the ith card chosen is an ace, and let it equal 
0 otherwise. Because 
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3 
X= >, 
i=1 


and E[X;] = P{X; = 1} = 1/13, it follows that E[X] = 3/13. But, with A 
being the event that the ace of spades is chosen, we have 
E[X] = E[X|A]P(A) + E[X|A°]P(A 


3 49 
E[X|A] 52 + E[X|A‘] = 


a 
#6 Lise 
—_ acre ace Cc 
E[XIA] <5 + a5 > X,|A 
i=1 
ae ; 
= EIX|A4] = + = E[X;|A‘] 
L=1 
= E[Xx|A] Sle sca aa 
7 52 52 51 


Using that E[X] = 3/13 gives the result 


E[x|A] = 22 3 49 3 580 soda 
~ SNAG: S247) ip 


Similarly, letting L be the event that at least one ace is chosen, we have 
E[X] E[X|L]P(L) + E[X|L°|P(L‘) 


E[X|L]P(L) 


E[X|LI(1 48 -47 - 46 
52°51-50 


Thus, 


3/13 


~ 52-51-50 
Another way to solve this problem is to number the four aces, with the 


ace of spades having number 1, and then let Y; equal 1 if ace number i 
is chosen and 0 otherwise. Then 
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4 


E ». Y;|¥; =1 


i=1 


E[X|A] 


4 
1+ > FlrAY4] 
= 72 


i 


2 
14+3:-—=19/17 
51 / 


where we used that the fact given that the ace of spades is chosen the 
other two cards are equally likely to be any pair of the remaining 51 
cards; so the conditional probability that any specified card (not equal 
to the ace of spades) is chosen is 2/51. Also, 


E[X|L] = E > Viil= > FLY, |L] = 4P{Y, = 1|L} 


Because 
Lae _ P(AL) _ P(A) | 3/52 
P{Y, = 1|L} = P(AIL) = PU) PL) , 48-47-46 
52-51-50 


we obtain the same answer as before. 
T.20 
a. EL |X =x] =P{Z<X|X =x} = Piz <x|X =x} =PiZ <x} = O() 
b. It follows from part (a) that E[/|X] = ©® (X). Therefore, 
E(t] = E[E[I| X]] = E[ ® (X)] 


The result now follows because E[/] = P{I = 1} = P{Z < X}. 
c. Since X — Z is normal with mean pw and variance 2, we have 
P{x >Z} = P{X-Z>0} 


A-L-f. 
pj > 


1-0() 
(8) 


7.26. | Let N be the number of heads in the first n + m — 1 flips. Let 
M = max(X,Y) be the number of flips needed to amass at least n heads 
and at least m tails. Conditioning on N gives 
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E[M] = >. EIMIN = JPN =i 


1 n+m-1 


z > E[M|N = iJP{N =i} + > E[M|N = iJP{N = 3} 


0 i=n 


Now, suppose we are given that there are a total of i heads in the first 
n+m-— 1 trials. If i < n, then we have already obtained at least m tails, 
so the additional number of flips needed is equal to the number needed 
for an additional n — i heads; similarly, if i > n, then we have already 
obtained at least n heads, so the additional number of flips needed is 
equal to the number needed for an additional m — (n + m — 1 — i) tails. 


Consequently, we have 
n—i1 


E[M] = > (nem—-14 "pw =9 


i=0 


n+m-—1 . 
+ pie =" pape 
| n+m io =] 
Li=n 
"— n-ifntm-1 
n—-ifn+m-— Dee 
=n+m-1+4+ > ( : )pia-»"" 
p l 
i=0 
n+m-1., 
a: > S(T pia-a 
| p 


The expected number of flips to obtain either n heads or m tails, 


E[min(X, Y)], is now given by 
n m 
E{min(X,Y)] = E[X +Y- M| = D + 1-p — E[M] 
7.27. — This is just the expected time to collect n — 1 of the n types of 
coupons in Example 2i _ . By the results of that example the solution 
is 


7.28. Withq=1-p, 
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-— Cov(X,Y) = E[XY]—E[X]E[Y] 
— = P(X=1,Y =1)—-P(X=1)P(Y =1) 
Hence, 
Cov(XY)=0 ©& P(X=1Y=1)=P(X=1)P(YY=1 
Because 


Cov(X, Y) 


Cov(1 — X,1-Y) = —Cov(1-X,Y) 
~Coviy1=Y) 


the preceding shows that all of the following are equivalent when X and 


Y are Bernoulli: 
A. Cov(X, Y) = 0 
BPX =1,=1) =P H=1)PY=1) 
C.P(1-xX =1,1-Y=1) =P -X =1)PO-Y=1) 
D.P(1-X =1,Y =1) =P(1-xX =1)P(Y = 1) 
E.P(X¥ =1,1-Y=1) =P(X¥ =1)Pa-Y=1) 


7.30. | Number the individuals, and let X;; equal 1 if the jth individual 


who has hat size i chooses a hat of that size, and let X;; equal 0 
otherwise. Then the number of individuals who choose a hat of their 


m2 2M 


size is 


Hence, 
nj T Tr 
-> Da J= yD taade 
7 non im 
— = — — L=1 


7.31. Letting of and of be, respectively, the variances of X and of Y, 
we obtain, upon squaring both sides, the equivalent inequality 
Var(X + Y) S$ of + of + 20,0, 


Using that Var(X + Y) = of + of + 2Cov(X, Y), the preceding inequality 
becomes 


Cov(X, Y) 
Corr(X, Y) = ———— <1 
Oxy 


which has already been established. 

7.32. Noting that X is equal to i plus the number of the values 
Rn+1-+-»Rn+m that are smaller than X, it follows that if we let J, +, 
equal 1 if R,,4, < X and let it equal 0 otherwise, that 


801 of 848 


Now, 
Elln+x] = P(Rn+k < X) 


= P(Rnix <i smallest of R,,.. .,Rn) 
= P(R,+, is one of thei smallest of the values 
Ri, oo Rn Rn+k) 
i 
n+1 


where the final equality used that R,, +, is equally likely to be either the 
smallest, the second smallest, ..., or the (n + 1)** smallest of the 
values R,,...,Rn,Rn+x- Hence, 


i 
E[X] = i+m—— 
[X] =i hg 


7.33. 
a. E = f,E[XIY = y|dy = if Done 


b. E[XY] ={- E[XY|Y = y]dy = i 2 vay = 1/6, which gives that 
Cov(X,Y) = 1/6 —1/8 = 1/24 


c. E[X =f, E[X*|Y = y] dy = je v dy = 1/9, giving that 
_ - 9 
ans) = 6 144 


P(X <x) 


1 
16 
[ roca y) dy 


1 
porsur=ndy+ | P(X sx|¥ =y)dy 
x 


x 1 
tie aa 
y — 

5 
0 x 


x — xlog(x) 
e. Differentiate part (d) to obtain the density f(x) = —log(x), 


802 of 848 


0<x<1. 


Chapter 8 


8.1. Let X denote the number of sales made next week, and note that X is 
integral. From Markov’s inequality, we obtain the following: 

E|X] 
19° 


b. P{X > 25} = P{xX > 26} < ae = 16/26 


a. P{X > 18} = P{x > 19} < = 16/19 


8.2. 


P{10 < X < 22} P{|X — 16| < 6} 
= P{|X—p| <6} 
= 1-P{|X— pl > 6} 
> 1-9/36=3/4 
b. P{X > 19} = P{x -16>3}< ener y 
i a a ae 
In part (a), we used Chebyshev’s inequality; in part (b), we used its 
one-sided version. (See Proposition 5.1. _) 


8.3. First note that E[X — Y| = 0 and 
Var(X — Y) = Var(X) + Var(Y) — 2Cov(X,Y) = 28 


Using Chebyshev’s inequality in part (a) and the one-sided version in parts (b) 
and (c) gives the following results: 
a. P{|X —Y| > 15} < 28/225 


b. PiX-Y>15 eae = 28/253 
-PIX-Y> } Sag y 005 = / 


c. P{Y-X>15}< = 28/253 


28 + 225 


8.4. _ If X is the number produced at factory A and Y the number produced at 


factory B, then 
E[Y — X] 


P{Y-X>0} = P{y-x>1} 


—2, Var(Y —X) =36+9=45 


P{Y -X+2>3}< = 45/54 


45+9 
8.5. Note first that 


1 
E[X;] - | Ie dx= 2/3 
0 
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Now use the strong law of large numbers to obtain 


in 
r= im — 
no ne 


~ n ee S,/n 
1 
im, S,/n 


= 1/(2/3) =3/2 
8.6. Because E[X;| = 2/3 and 
1 
E[X?] -| oxtdxe = 1/2 
0 


we have Var(X;) = 1/2 — (2/3)? = 1/18. Thus, if there are n components on 


hand, then 
P{S, = 35} = P{S, > 34.5} (the continuity correction) 


fe —2n/3 _34.5- a 


n/18 ’ n/18 


34.5 — 2n/3 
RZ > ————— 


4/7/18 


R 


where Z is a standard normal random variable. Since 
P{Z > — 1.284} = P{Z < 1.284} = .90 


we see that n should be chosen so that 
(34.5 — 2n/3) = — 1.284,/n/18 


A numerical computation gives the result n = 55. 


8.7. _ If X is the time required to service a machine, then 
E[X] =.2+.3=.5 


Also, since the variance of an exponential random variable is equal to the 


square of its mean, we have 
Var(X) = (.2)? + (.3)? =.13 


Therefore, with X; being the time required to service job i,i = 1, ...,20, and Z 
being a standard normal random variable, it follows that 
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P{X, +++ + Xaq < 8} Om ra 


P{Z < — 1.24035} 
1074 


[>a “| 


R 


R 


8.8. Note first that if X is the gambler’s winnings on a single bet, then 
E[X] = —.7-.4+1= —.1,£[X*] =.7+.8+10=115 


— Var(X) = 11.49 


Therefore, with Z having a standard normal distribution, 


Xe Kip 10: Ss £10 
P{X, +++ Xi99 < —.5} = pene 


V1149 ~ 1149 
= P{Z <.2803} 
= .6104 


8.9. | Using the notation of Problem 8.7, we have 
X, +:+++X29 — 10 | 


PUX, ++ +Xoq <t} = P le 
me 20 <t} V2.6 V2.6 


R 


t—10 
Plz < aera 
Now, P{Z < 1.645} = .95, so t should be such that 

t—10 

pra = 1.645 
which yields t + 12.65. 
8.10. If the claim were true, then, by the central limit theorem, the average 
nicotine content (call it X) would approximately have a normal distribution with 
mean 2.2 and standard deviation .03. Thus, the probability that it would be as 
high as 3.1 is 


X-2.2 3.1- 2 


PixX>3d) = Pf ae ae 


R 


P{Z > 30} 
0 


R 


where Z is a standard normal random variable. 
8.11. 

a. If we arbitrarily number the batteries and let X; denote the life of battery 
i,i=1,...,40, then the x; are independent and identically distributed 
random variables. To compute the mean and variance of the life of, say, 
battery 1, we condition on its type. Letting J equal 1 if battery 1 is type A 
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8.12. 


and letting it equal 0 if it is type B, we have 
ELX, |f =1] = 50, E[X,|1 = 0] = 30 


yielding 
E[X,] = 50P{I = 1} + 30P{I = 0} = 50(1/2) + 30(1/2) = 40 


In addition, using the fact that E[W?] = (E[W])” + Var(W), we have 
E[X?|1 = 1] (50)? + (15)? = 2725, 


E[X?|I = 0] (30)* + 6? = 936 
yielding 
E[X?] = (2725)(1/2) + (936)(1/2) = 1830.5 


Thus, X,...,X49 are independent and identically distributed random 
variables having mean 40 and variance 1830.5 — 1600 = 230.5. Hence, 
with S = X;, we have 

i=1 


t= 


E[S] = 40(40) = 1600, _ Var(S) = 40(230.5) = 9220 


and the central limit theorem yields 
S — 1600 . 1700 — 1600 
V9220 V9220 


P{Z > 1.041} 
1— (1.041) = .149 


P{S>1700} = Pt 


R 


. For this part, let S,, be the total life of all the type A batteries and let Sp 


be the total life of all the type B batteries. Then, by the central limit 
theorem, S, has approximately a normal distribution with mean 
20(50) = 1000 and variance 20(225) = 4500, and S, has 
approximately a normal distribution with mean 20(30) = 600 and 
variance 20(36) = 720. Because the sum of independent normal 
random variables is also a normal random variable, it follows that 

S, + Sp iS approximately normal with mean 1600 and variance 5220. 
Consequently, with S = S, + Sz, 


P{S > 1700} 


: S — 1600 a 1700 — 1600 
V5220 V5220 


P{Z > 1.384} 
1— © (1.384) = .084 


R 


Let N denote the number of doctors who volunteer. Conditional on the 
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event N = i, the number of patients seen is distributed as the sum of i 
independent Poisson random variables with common mean 30. Because the 
sum of independent Poisson random variables is also a Poisson random 
variable, it follows that the conditional distribution of X given that N = iis 
Poisson with mean 30i. Therefore, 

E[X|N]=30N _ Var(X|N) = 30N 


As a result, 
E[X] = E[E[X|N]] = 30E[N] = 90 


Also, by the conditional variance formula, 
Var(X) = E[Var(X|N)] + Var(E[X|N]) 


= 30E[N] + (30)?Var(N) 


Because 


24 2 2 2 = 
Var(N) = 3 (27 +3°+47)-9=2/3 


we obtain Var(X) = 690. 
To approximate P{X > 65}, we would not be justified in assuming that the 
distribution of X is approximately that of a normal random variable with mean 


: P;(65) 
—_ 


where P;(65) is the probability that a Poisson random variable with mean 30i 


90 and variance 690. What we do know, however, is that 
4 


P{X > 65} = » P{X > 65|N = i}P{N = i} = 


i=2 


oo 


is greater than 65. That is, 
65 


P65) =1— > e- 30139 ,)) /;! 


j =0 


Because a Poisson random variable with mean 30i has the same distribution 
as does the sum of 30i independent Poisson random variables with mean 1, it 
follows from the central limit theorem that its distribution is approximately 
normal with mean and variance equal to 30i. Consequently, with X; being a 
Poisson random variable with mean 30i and Z being a standard normal 
random variable, we can approximate P;(65) as follows: 


P;(65) 


Therefore, 
P, (65) 


P3(65) 
P4(65) 


leading to the result 


R 


R 


P{X > 65} 
P{X > 65.5} 


X — 30i 7 65.5 — 30i 
V30i ~—s—is« 33:0 


Plz > 


P{Z > .7100} ~ .2389 


65.5 — 30i 
v30i 


P{Z > — 2.583} = .9951 
P{(Z> —4.975} = 1 


P{X > 65} ~ .7447 


If we would have mistakenly assumed that X was approximately normal, we 
would have obtained the approximate answer .8244. (The exact probability is 


.7440.) 
8.13. 
obtain 
n 
log | | Xj 
(=a 
Therefore, 


n 
8.14. Let X; be the time it takes to process book i, and let S,, = 3 
U 


Take logarithms and then apply the strong law of large numbers to 


1/n 


3 


n 


oo log(X;) > Ellog(X)] 


1/n 


eEllog(x;)] 


X;. 


i=1 


a. With Z being a standard normal 


P{S4o > 420} 
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(S49 — 400 . 420 — 400 

7 V40-9 V40-9 
P\Z> = 146 

~ V360) 
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P{Sp5 < 240} 


Sos — 250 _ 240 ~ 250 
V25:9 ~ V25-9 


R 


PxZ< as 2525 
— 15 eg 
We have assumed that the successive book processing times are 


independent. 


8.15. Let P(X =i) =1/n,i=1,...,n. Also, let f(x) = a, and g(x) = by. 
Then f and g are both increasing functions and so 
E[f (X)g(X)| = E[f(XOJE[g(XD], which is equivalent to 


n 
Yas GL. OG 
n oe 
i=1 = 


a 


Chapter 9 


9.1. From axiom (iii), it follows that the number of events that 
occur between times 8 and 10 has the same distribution as the 
number of events that occur by time 2 and thus is a Poisson random 
variable with mean 6. Hence, we obtain the following solutions for 
parts (a) and (b): 
a. P{N(10) — N(8) = 0} =e 
b. E[N(10) — N(8)] = 6 
c. It follows from axioms (ii) and (iii) that from any point in time 
onward, the process of events occurring is a Poisson process 
with rate A. Hence, the expected time of the fifth event after 2 
P.M. is 2+ E[S;] = 2 +5/3. That is, the expected time of this 
event is 3:40 P.M. 


9.2. 


P{N(1/3) = 2|N(1) = 2} 
_ P{N(1/3) = 2,N(1) = 2} 


P{N(1) = 2} 
_ P{N(1/3) = 2,N(1) — N(1/3) = 0} 
7 P{N(1) = 2} 
P{N(1/3) = 2}P{N(1) — N(1/3) = 0} anges 
= —  PING)=2})... | (by axlom (ii)) 
PiN(1/3) = 2$P{N(2/3) =0 
= PINGS) = OANA) SU ei pa oe ) j (by axiom (iii) ) 


e4/3(4/3)* /2!e~24/3 
7 e447 /2! 
=1/9 
P{N(1/2) => 1|N(1) = Zh =1- P{N(1/2) = 0|N(1) = 2} 
P{N(1/2) = 0, N(1) = 2} 
7 P{N(1) = 2} 
aa P{N(1/2) = 0,N(1) — N(1/2) = 2} 
a P{N(1) = 2} 
P{N(1/2) = O}P{N(1) — N(1/2) = 2} 
= P(N) = 2) 
a4 P{N(1/2) = O}P{N(1/2) = 2} 
— P{N(1) = 2} 


b. = 


e4/2e-4/2(), 12)? /2! 
e-447/2! 
=1-1/4=3/4 


9.3. Fix a point on the road and let X,, equal 0 if the nth vehicle to 
pass is a car and let it equal 1 if it is a truck, n => 1. We now suppose 
that the sequence X,,,n = 1, is a Markov chain with transition 


probabilities 
Poo = 5/6, Poi = 1/6, Py) = 4/5, Py, =1/5 


Then the long-run proportion of times is the solution of 
My = Mo(5/6) + 1 (4/5) 


™, = Wo(1/6) +m, (1/5) 


Tog +l, = 1 


Solving the preceding equations gives 
1 = 24/29 1m, =5/29 
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Thus, 2400/29 = 83 percent of the vehicles on the road are cars. 
9.4. The successive weather classifications constitute a Markov 
chain. If the states are 0 for rainy, 1 for sunny, and 2 for overcast, 
then the transition probability matrix is as follows: 
0 1/2 1/2 
P=1/3 1/3 1/3 
1/3 1/3 1/3 


The long-run proportions satisfy 
TM = (1/3) +72(1/3) 
M, = Mo(1/2) +7,(1/3) + m2, (1/3) 
Mz = Mo(1/2) +m, (1/3) + m2, (1/3) 


1 = m+7,+4+72 


The solution of the preceding system of equations is 
T =1/4, m,=3/8, m2 =3/8 


Hence, three-eighths of the days are sunny and one-fourth are rainy. 
9.5. 


a. A direct computation yields 
H(X)/H(Y) = 1.06 


b. Both random variables take on two of their values with the 
same probabilities .35 and .05. The difference is that if they do 
not take on either of those values, then X, but not Y, is equally 
likely to take on any of its three remaining possible values. 
Hence, from Theoretical Exercise 9.13, we would expect 
the result of part (a). 


Chapter 10 


10.1. 
a.1=Cfje%dx > C =1/(e-1) 


x 


e 
b. F(x) = CJye%dy = 


Hence, if we let X = F*(U), then 


O<x<l 


or 
X = log(U(e — 1) + 1) 
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Thus, we can simulate the random variable X by generating a random 
number U and then setting X = log(U(e — 1) + 1). 


10.2. | Use the acceptance-rejection method with g(x) =1,0<x <1. 
Calculus shows that the maximum value of f(x)/g(x) occurs at a value of 


x,0 <x < 1, such that 
2x — 6x? + 4x2 =0 


or, equivalently, when 
4x? — 6x +2 = (4x -2)(x-1) =0 


The maximum thus occurs when x = 1/2, and it follows that 
C = max f(x)/g(x) = 30(1/4 —2/84+1/16) = 15/8 


Hence, the algorithm is as follows: 

Step 1Generate a random number U,. 

Step 2Generate a random number U,. 

Step 3f U, < 16(U? — 2U? + U), set X = U;; else return to Step 1. 


10.3. — It is most efficient to check the higher probability values first, as in the 
following algorithm: 

Step 1Generate a random number U. 

Step 2If U < .35, set X = 3 and stop. 

Step 3lf U < .65, set X = 4 and stop. 

Step 4]f U < .85, set X = 2 and stop. 

Step 5x = 1. 


10.4. 2u-x 
10.5. 
a. Generate 2n independent exponential random variables with mean 


n 
1,X;,Y;,i=1,...,n, and then use the estimator > ert Rn, 
i=4 


b. We can use XY as a control variate to obtain an estimator of the type 


n 
> (e*i¥i + cX,Y;)/n 


i=1 


Another possibility would be to use XY + X*Y*/2 as the control variate 
and so obtain an estimator of the type 


n 
>» (e%i%i + c[Xi¥; + X7¥?/2—1/2])/n 
i=1 
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The motivation behind the preceding formula is based on the fact that 


the first three terms of the MacLaurin series expansion of e*” are 
1+xy + (x*y?)/2. 
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Bernoulli random variable, 137, 143 

Bernoulli trials, 114 

Bernstein polynomials, 427 

Bertrand’s paradox, 199-200 

best prize problem, 350-351 

beta binomial random variable, 299 


beta distribution, 221, 233, 298 


binary symmetric channel, 445 
binomial coefficients, 7 
binomial random variable, 137, 174 
normal approximation, 207-210 
approximation to hypergeometric, 165 
computing its mass function, 145 
moments of, 322—323 
simulation of, 461 
sums of independent, 266—267, 366 
with randomly chosen success probability, 351-352 
binomial theorem, 7, 8 
birthday problem, 38, 246-247 
bivariate exponential distribution, 300 
bivariate normal distribution, 272-274, 343-344, 372, 373 
Bonferroni’s inequality, 55, 391 
Boole’s inequality, 33, 57, 306-307 
Borel, 409 
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branching process, 389 
bridge, 37-38, 60 


Buffon’s needle problem, 249-250, 296 


Cantor distribution, 387 

Cauchy distribution, 220-221 
Cauchy-Schwarz inequality, 387 

center of gravity, 128 

central limit theorem, 200, 397-399, 405 


channel capacity, 446 
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Chapman-Kolmogorov equations, 434—435 
Chebychev’s inequality, 395 
one-sided, 410, 411 
and weak law of large numbers, 397 
Chebychev’s sum inequality, 429 
Chernoff bound, 412-413 
for binomial, 413-414 
for standard normal, 413 
for Poisson, 413 
chi-squared distribution, 261 
density function, 261 
relation to gamma distribution, 261 
simulation of, 459 
coding theory, 441 
and entropy, 443 
combinations, 5, 6 
combinatorial analysis, 1 
combinatorial identities, 18, 19, 20, 21 
commutative law for events, 24 
complement of an event, 24 
complete graph, 93-94 
computing expectations by conditioning, 339-340 
computing probabilities by conditioning, 64—71, 72-78, 102, 349 
concave function, 415 
conditional covariance formula, 387 
conditional cumulative distribution function, 270 
conditional distribution, 267, 274 


continuous case, 270 
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discrete case, 267 
conditional expectation, 337, 338 
use in prediction, 356 
use in simulation, 463 
conditional independence, 100 
conditional probability, 58, 59 
as a long run relative frequency, 64 
as a probability function, 95 
satisfying axioms of probability, 95-96 
conditional probability density function, 270 
conditional probability distribution function, 267 
conditional probability mass function, 267 
conditional variance, 354 
conditional variance formula, 354 
and simulation, 463 
continuity correction, 208 
continuity property of probabilities, 44-46 
continuous random variables, 189 
control variate, 465 
convex function, 415 
convolution, 258 
correlation, 334 
coupon collecting problems, 121-123, 309-310, 320-321, 324-325, 327-328 
covariance, 329 
inequality, 415-417 
craps, 341-343 
cumulative distribution function, 123, 174 


properties of, 172-174 
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de Mere, Chevalier, 84 

DeMoivre, A., 200, 210, 211, 399 

DeMoivre-Laplace limit theorem, 207 

DeMorgan’s laws, 26 

dependent events, 78 

dependent random variables, 248 

deviations, 331 

discrete random variables, 123, 174 

distribution function, see cumulative distribution function, 
distribution of a function of a random variable, 224-226 
distribution of a sum of a independent random variables, 353-354 
distributive law for events, 24 

DNA match, 76—78 

dominant genes, 108 

double exponential distribution, see Lapace distribution 


doubly stochastic matrix, 447—448 


Ehrenfest urn model, 434, 437 
entropy, 439 
ergodic Markov chain, 436 
Erlang distribution, 219 
evaluating evidence, 70-72 
events, 23 
decreasing sequence of, 44 
increasing sequence of, 44 
independent, 78 
mutually exclusive, 24 


exchangeable random variables, 287-288 
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expectation, 126, 174, 303, 375-376 
of a beta random variable, 222 
of a binomial random variable, 142, 170, 307 
as a center of gravity, 128 
of a continuous random variable, 193 
of an exponential random variable, 212 
of a function of a random variable, 128, 194 
of a gamma random variable, 219 
of a geometric random variable, 159-160 
of a hypergeometric random variable, 166, 170, 308 
of a negative binomial random variable, 162, 307-308 
of a nonnegative random variable, 162 
of a normal random variable, 202—203 
of number of matches, 309 
of number of runs, 310-311 
of a Pareto random variable, 223-224 
of a Poisson random variable, 148 
of sums of a random number of random variables, 340-341 
of sums of random variables, 167-170, 305 
of the number of successes, 170 
of a uniform random variable, 198 
tables of, 364, 365 
expected value, see expectation 
exponential random variables, 211 
moments of, 232 
rate of, 216 
relation to half life, 255-257 


simulation of, 453 
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sums of, 261 


failure rate function, see hazard rate function, 
Fermat, P., 84, 89 

Fermat’s combinatorial identity, 18 

first digit distribution, 175-176 

first moment, see mean 

frequency interpretation of probability, 27, 126 


friendship paradox, 134-135, 178 


Galton, F., 405-406 

gambler’s ruin problem, 87-89, 346-348 
application to drug testing 89-91 
multiple player, 86-87 

game theory, 177 

gamma random variables, 218, 297 
relation to chi-squared distribution, 219, 261-262, 296 
relation to exponential random variables, 261 
relation to Poisson process, 218-219 
simulation of, 454 
sums of independent, 261 

gamma function, 218, 261 
relation to beta function, 284 

Gauss, KF, 210, 211 

Gaussian distribution, see normal distribution 

genetics, 108, 109-110 

geometric random variable, 158, 174-175 
simulation of, 460-461 


geometrical probability, 199 
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Gini index, 423-424 


Hamiltonian path, 317-318 
hazard rate function, 215-216, 296 
relation to distribution function, 216 
Huygens, C., 84, 89 
hypergeometric random variable, 163-164, 175 
relation to binomial, 165 


moments of, 323-324 


importance sampling, 467 
inclusion-exclusion identity, 31-32, 314-315 
inclusion-exclusion bounds, 32—33 
independent events, 78, 80, 103 
conditional, 100 
independent increments, 430 
independent random variables, 247-248, 252, 253, 257, 258 
indicator random variables, 127, 
information, 439 
interarrival times of a Poisson process, 431-432 
integer solution of equations, 12—13 
intersection of events, 23, 24 
inverse transform method, 453 


discrete, 459-460 


Jensen’s inequality, 415 

joint cumulative probability distribution function, 237, 245 
joint moment generating function, 370-371 

joint probability density function, 241, 245 


of functions of random variables, 280, 285 
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joint probability mass function, 238 


jointly continuous random variables, 241, 245 


k-of-n system, 109 

keno, 182 

Khintchine, 397 

knockout tournament, 11-12 
Kolmogorov, A., 409 


Kullback-Leiber divergence, 427 


Laplace, P., 200, 399, 405-406 
Laplace distribution, 214-215 
Laplace’s rule of succession, 100-101, 115, 116 
law of frequency of errors, 406 
law of total probability, 72 
laws of large numbers, 394 
Legendre’s theorem, 233 
Liapounoff, 399 
limit of events, 44 
lineqr prediction, 359 
lognormal distribution, 226—227, 265 
Lorenz curve, 420, 426, 428 
of an exponentially distributed population, 422 
of a Pareto distributed population, 422 
of a uniformly distributed population, 421 


properties of, 422-423 


marginal probability mass function, 245 
Markov chain, 432—433 


Markov’s inequality, 394-395 
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matching problem, 41—42, 55-56, 62-63, 99-100, 109, 324 
maximum likelihood estimates, 183 
maximums-minimums identity, 319-320 
mean of a random variable, 132 
measurable events, 29 
median of a random variable, 232, 385 
memoryless random variable, 213, 214 
Mendel, G., 139 
midrange, 298 
minimax theorem, 177 
mode of a random variable, 232 
moment generating function, 360 
of a binomialrandom variable, 361 
of a chi-squared random variable, 367 
of an exponential random variable, 363 
of a normal random variable, 363-364 
of a Poisson random variable, 362 
of a sum of independent random variables, 364 
of a sum of a random number of random variable, 367-368 
tables for, 364, 365 
moments of a random variable, 132 
of the number of events that occur, 322 
multinomial coefficients, 11 
multinomial distribution, 246, 268, 336 
multinomial theorem, 11 
multiplication rule of probability, 61-62, 102 
multivariate normal distribution, 371 


joint moment generating function, 372 
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mutually exclusive events, 24 


NCAA tournament, 91-93 
negative binomial random variable, 160, 161, 175 
relation to binomial, 185 
relation to geometric, 161 
negative hypergeometric random variable, 188, 325-327 
Newton, I., 211 
noiseless coding theorem, 442—443 
noisy coding theorem, 446 
normal random variables, 200, 281-282 
approximations to binomial, 207-208 
characterization of, 250-251 
joint distribution of sample mean and sample variance, 373-375 
moments of, 390 
simulation, 283-284 
simulation by polar method, 457-459 
simulation by rejection method, 455—457 
sums of independent, 262—263, 366-367 
null event, 24 


null set, 24 


odds of an event, 70—71, 102 
order statistics, 276 
density of j°", 278 


joint density of, 276, 279 


parallel system, 81 
Pareto, 167 


Pareto random variable, 223, 275 
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partition, 55, 57 
Pascal, B., 84 
Pascal random variables, see negative binomial random variables, 
Pascal's identity, 7 
Pearson, K., 210 
permutations, 3 
personal view of probability, 48 
Poisson, S., 146 
Poisson paradigm, 151 
Poisson process, 155-157, 430-432 
Poisson random variables, 146, 174, 248-249, 268 
as an approximation to binomial, 147 
as an approximation to the number of events that occur, 149-151 
bounds on the error of a Poisson approximation, 418—420 
bounds on its probabilities, 413, 427 
computing its probabilities, 158 
simulation of, 461-462 
sums of independent, 266, 366 
poker, 36, 37 
poker dice, 51 
polar algorithm, 457—459 
Polya’s urn model, 289 
posterior probability, 101 
power law density, 224 
prior probability, 101 
probabilistic method, 94, 317 
probability of an event, 27 


as a continuous set function, 44—46 
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as a limiting proportion, 26-27 
as a measure of belief, 48-49 
probability density function, 189 
of a function of a random variable, 225 
relation to cumulative distribution function, 192 
probability mass function, 123, 174 
relation to cumulative distribution function, 125 


problem of the points, 84—85, 161 


quicksort algorithm, 312-314 


random number, 391, 451 

pseudo, 451 
random number generator, 451 
random permutation, 393, 451—452 
random subset, 253-254 
random variables, 119, 174 
random walk, 435 
range of a random sample, 279-280 
rate of the exponential, 216 
Rayleigh density function, 216, 283 
record value, 387 
reduced sample space, 60 
rejection method of simulation, 454—4.55 
relative frequency definition of probability, 27 
Riemann zeta function, 167 
round robin tournament, 115 
runs, 43-44, 97-98, 151-155 


longest, 151-155 
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sample mean, 306 

sample median, 278 

sample space, 22 

sample variance, 331 

sampling from a finite population, 209-210 

sampling with replacement, 52 

sequential updating of information, 101-102 

serve and rally games, 85—86 

Shannon, C., 446 

signal to noise ratio, 426 

simulation, 451 

St. Petersburg paradox, 178 

standard deviation, 137, 174 
inequality, 393 

standard normal distribution function, 203—204 
bounds, 427 
table of, 204 

standard normal random variable, 202 
moments, 389 

stationary increments, 430 

Stieltjes integral, 375-376 

Stirling’s approximation, 144 

stochastically larger, 386 

strong law of large numbers, 406-408 

subjective probability, see personal probability 

subset, 24 

superset, 24 


surprise, 438-439 


t distribution, 271-272 

transition probabilities, 433 
n—stage, 434 

trials, 80 

triangular distribution, 259 


twin problem, 70 


uncertainty, 439-441 
uncorrelated random variables, 335 
uniform random variables, 197 
sums of independent, 258-260 
union of events, 23, 24 
probability formula for, 29-32 
unit normal random variable, see standard normal random variable, 


utility, 131-132, 


value at risk, 206—207 
variance, 132, 133, 136, 174 
of a beta random variable, 223 
of a binomial random variable, 142-143, 171, 332 
of a continuous random variable, 196 
of an exponential random variable, 212 
of a gamma random variable, 219 
of a geometric random variable, 160, 345-346 
of a hypergeometric random variable, 166-167, 171-172 
as a moment of inertia, 137 
of a negative binomial random variable, 162 
of a normal random variable, 202—203 


of a Pareto random variable, 224 
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of a Poisson random variable, 149 
of a sum of a random number of random variables, 355 
of sums of random variables, 330 
of a uniform random variable, 199 
of the number of successes, 171-172 
tables for, 364, 365 
Venn diagrams, 24, 25 


von Neumann, J., 177 


waiting times of a Poisson process, 432 
weak law of large numbers, 397 
Weibull distribution, 219-220 

relation to exponential, 233 


Weierstrass theorem, 427 
Yule-Simon distribution, 183 


zeta distribution, 167 


Zipf distribution, see zeta distribution 
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Common Discrete Distributions 


e Bernoulli(p) X indicates whether a trial that results in a success with probability 
p is a success or not. 
P{X = 1}=p 


P{X =0}=1-p 
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E|X]=p, Var(X) = p(1— p). 
e Binomial(n,p) X represents the number of successes in n independent trials 
when each trial is a success with probability p. 


P(X == (‘pra i eee 
i + J d / nl 


E[X] =np, Var(X) = np(1 — p). 
Note. Binomial(1,p) = Bernoulli(p). 
e Geometric(p) X is the number of trials needed to obtain a success when each 
trial is independently a success with probability p. 
PX=0=p0=p) * 1= 1,2. 


E[X] = or Var(X) = ae 
e Negative Binomial(7,p) X is the number of trials needed to obtain a total of r 
successes when each trial is independently a success with probability p. 


por= = ( 


i— _ 

pra —p)", i=rrt+1yr+t+2... 
r-1 

aye Sy D 
E[X] = D Var(X) =r a 
Note. 

1. Negative Binomial(1,p) = Geometric(p). 

2. Sum of r independent Geometric(p) random variables is Negative 

Binomial(r,p). 


e Poisson(A) X is used to model the number of events that occur over a set 
interval when these events are either independent or weakly dependent and each 
has a small probability of occurrence. 

P{X =i}=e742'/il, i=0,1,2,.. 


E[X] =A, Var(X) =A. 
Note. 

1. A Poisson random variable X with parameter 2 = np provides a good 
approximation to a Binomial(n,p) random variable when 7n is large and p is 
small. 

2. If events are occurring one at a time in a random manner for which (a) the 
number of events that occur in disjoint time intervals is independent and 
(b) the probability of an event occurring in any small time interval is 
approximately A times the length of the interval, then the number of events 
in an interval of length t will be a Poisson(At) random variable. 


e Hypergeometric(m, N — m,n) X is the number of white balls in a random sample 
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of n balls chosen without replacement from an urn of N balls of which m are 


P{X == 


T 

The preceding uses the convention that ( ) = 0 if either j <Oorj>r. 
J 

With p = m/N, E[X| = np, Var(X) = —— np(i —p). 


Note. If each ball were replaced before the next selection, then X would be a 
Binomial(n,p) random variable. 


Negative Hypergeometric X is the number of balls that need be removed from 
an urn that contains n + m balls, of which n are white, until a total of r white balls 
has been removed, where r < n. 


( : )( 
r—-1/\k-r n—-rt+l1 
t } (07) n+m—-k+1’ ea 


k-1 
_ ntmti _ mr(nt+i-r)(ntmt1) 
B= et ee = (nt+1)*(n+2) 


Common Continuous Distributions 


e Uniform (a,b) X is equally likely to be near each value in the interval (a,b). Its 
density function is 


1 
FQ) =F: a<x<b 


a2 
E[X] =", var(x) = 2-2. 
e Normal(u,c7) X is a random fluctuation arising from many causes. Its density 


function is 


e7 (@—H)*/207 — oo < x < fore) 


f(x) = 


1 
V210 
E[X] =u, Var(X) =o7. 
1. When pu = 0, o = 1, X is called a standard normal. 
Notes. 
2. If X is Normal(u, 07), then Z = = is standard normal. 


3. Sum of independent normal random variables is also normal. 


830 of 848 


4. An important result is the central limit theorem, which states that the 
distribution of the sum of the first n of a sequence of independent and 
identically distributed random variables becomes normal as n goes to 
infinity, for any distribution of these random variables that has a finite mean 
and variance. 


e Exponential(A) X is the waiting time until an event occurs when events are 


occurring at a rate A > 0. Its density is 
fo Hae 20 
E[X] = =, Var(X) = =z) P(X >x)=e% x>0. 
Note. X is memoryless, in that the remaining life of an item whose life distribution 
is Exponential(A) is also Exponential(A), no matter what the current age of the 
item is. 
Gamma(a,A) When a = n, X is the waiting time until n events occur when events 
are occurring at a rate A > 0. Its density is 
{@= CO t>0 
Tr(a) ¢ 


where [I (a) = fie tet de is called the gamma function. 


E[X]=5, Var(X) = 3. 
Note. 
1. Gamma(1,A) = Exponential(A). 
2. If the random variables are independent, then the sum of a Gamma(q@j,A) 
and a Gamma(a@z,A) is aGamma(a, + @,A). 
3. The sum of n independent and identically distributed exponentials with 
parameter A is a Gamma(n,A) random variable. 


Beta(a,b) X is the distribution of a random variable taking on values in the 
interval (0, 1). Its density is 


f(x) = 


x?-1(1-—x)?"1, O<x<1 


B(ab) 


1 
where B(a,b) = | x*-1(1 — x)?” ‘dx is called the beta function. 
0 


ab 
Var(X) = (at+b)*(a+b+1) 
Note. 

1. Beta(1, 1) = Uniform(0,1). 


2. The j*" smallest of n independent Uniform (0, 1) random variables is a 
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Beta(j,n — j + 1) random variable. 


= Chi-Squared(n) X is the sum of the squares of n independent standard 
normal random variables. Its density is 


n 


= ==1 
e x/253 


f@) = sap 


= Ci aK 
Pr (n/2) 


Add E(x) and Var(x) 
Notes. 
1. Chi-Squared(n) = Gamma(n/2, 1/2). 
2. The sample variance of n independent and identically distributed Normal 
(u,07) random variables multiplied by “> is a Chi-Squared(n — 1) 


random variable, and it is independent of the sample mean. 


e Cauchy X is the tangent of a uniformly distributed random angle between —7/2 
and 7/2. Its density is 
1 


LOO = Fea” 


— 0 <x < 00 
E[X] is undefined. 


e Pareto(A, a) If Y is exponential with rate A and a > 0, then X = ae’ is said to 
Pareto with parameters A and a. Its density is 
f@O =ia'x "4, 2s a 


2 
When a > 1, E[X] = ~%, and when a > 2, Var(X) = ——*—,. 
aA-1 (A—2)(A~1) 
Note. 


The conditional distribution of X given that it exceeds x, > a is Pareto (A,x9). 
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